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Preface 


This book is targeted primarily toward engineers and engineering students of ad¬ 
vanced standing (sophomores, seniors and graduate students). Familiarity with a 
computer language is required; knowledge of basic engineering subjects is useful, but 
not essential. 

The text attempts to place emphasis on numerical methods, not programming. 
Most engineers are not programmers, but problem solvers. They want to know what 
methods can be applied to a given problem, what are their strengths and pitfalls and 
how to implement them. Engineers are not expected to write computer code for basic 
tasks from scratch; they are more likely to utilize functions and subroutines that have 
been already written and tested. Thus programming by engineers is largely confined 
to assembling existing pieces of code into a coherent package that solves the problem 
at hand. 

The “piece” of code is usually a function that implements a specific task. For the 
user the details of the code are unimportant. What matters is the interface (what goes 
in and what comes out) and an understanding of the method on which the algorithm 
is based. Since no numerical algorithm is infallible, the importance of understanding 
the underlying method cannot be overemphasized; it is, in fact, the rationale behind 
learning numerical methods. 

This book attempts to conform to the views outlined above. Each numerical 
method is explained in detail and its shortcomings are pointed out. The examples 
that follow individual topics fall into two categories: hand computations that illustrate 
the inner workings of the method, and small programs that show how the computer 
code is utilized in solving a problem. Problems that require programming are marked 
with ■. 

The material consists of the usual topics covered in an engineering course on 
numerical methods: solution of equations, interpolation and data fitting, numerical 
differentiation and integration, solution of ordinary differential equations and eigen¬ 
value problems. The choice of methods within each topic is tilted toward relevance 
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to engineering problems. For example, there is an extensive discussion of symmetric, 
sparsely populated coefficient matrices in the solution of simultaneous equations. 
In the same vein, the solution of eigenvalue problems concentrates on methods that 
efficiently extract specific eigenvalues from banded matrices. 

An important criterion used in the selection of methods was clarity. Algorithms 
requiring overly complex bookkeeping were rejected regardless of their efficiency and 
robustness. This decision, which was taken with great reluctance, is in keeping with 
the intent to avoid emphasis on programming. 

The selection of algorithms was also influenced by current practice. This dis¬ 
qualified several well-known historical methods that have been overtaken by more 
recent developments. For example, the secant method for finding roots of equations 
was omitted as having no advantages over Brent’s method. For the same reason, the 
multistep methods used to solve differential equations (e.g., Milne and Adams meth¬ 
ods) were left out in favor of the adaptive Runge-Kutta and Bulirsch-Stoer methods. 

Notably absent is a chapter on partial differential equations. It was felt that this 
topic is best treated by finite element or boundary element methods, which are outside 
the scope of this book. The finite difference model, which is commonly introduced 
in numerical methods texts, is just too impractical in handling multidimensional 
boundary value problems. 

As usual, the book contains more material than can be covered in a three-credit 
course. The topics that can be skipped without loss of continuity are tagged with an 
asterisk (*). 

The programs listed in this book were tested with MATLAB® 6.5.0 and under 
Windows® XE The source code can be downloaded from the book’s website at 

www.cambridge.org/0521852889 

The author wishes to express his gratitude to the anonymous reviewers and 
Professor Andrew Pytel for their suggestions for improving the manuscript. Credit 
is also due to the authors of Numerical Recipes (Cambridge University Press) whose 
presentation of numerical methods was inspirational in writing this book. 
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Introduction to MATLAB 


1.1 General Information 

Quick Overview 

This chapter is not intended to be a comprehensive manual of MATLAB . Our sole 
aim is to provide sufficient information to give you a good start. If you are familiar 
with another computer language, and we assume that you are, it is not difficult to pick 
up the rest as you go. 

MATLAB is a high-level computer language for scientific computing and data vi¬ 
sualization built around an interactive programming environment. It is becoming the 
premiere platform for scientific computing at educational institutions and research 
establishments. The great advantage of an interactive system is that programs can be 
tested and debugged quickly, allowing the user to concentrate more on the principles 
behind the program and less on programming itself. Since there is no need to com¬ 
pile, link and execute after each correction, MATLAB programs can be developed in 
much shorter time than equivalent FORTRAN or C programs. On the negative side, 
MATLAB does not produce stand-alone applications—the programs can be run only 
on computers that have MATLAB installed. 

MATLAB has other advantages over mainstream languages that contribute to 
rapid program development: 

• MATLAB contains a large number of functions that access proven numerical li¬ 
braries, such as LINPACK and EISPACK. This means that many common tasks (e.g., 
solution of simultaneous equations) can be accomplished with a single function 
call. 

• There is extensive graphics support that allows the results of computations to be 
plotted with a few statements. 

• All numerical objects are treated as double-precision arrays. Thus there is no need 
to declare data types and carry out type conversions. 


1 
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The syntax of MATLAB resembles that of FORTRAN. To get an idea of the similari¬ 
ties, let us compare the codes written in the two languages for solution of simultaneous 
equations Ax = b by Gauss elimination. Here is the subroutine in FORTRAN 90: 

subroutine gauss(A,b,n) 
use prec_mod 
implicit none 

real(DP), dimension(:,:), intentfin out) :: A 
real(DP), dimension(:), intentfin out) :: b 
integer, intent(in) : : n 

real(DP) :: lambda 
integer :: i,k 

! -Elimination phase- 

do k = l,n-l 

do i = k+l,n 

if(A(i,k) /= 0) then 

lambda = A(i,k)/A(k,k) 

A(i,k+l:n) = A(i,k+l:n) - lambda*A(k,k+1:n) 
b(i) = b(i) - lambda*b(k) 
end if 
end do 
end do 

! -Back substitution phase- 

do k = n,1,-1 

b(k) = (b(k) - sum(A(k,k+1:n)*b(k+l:n)))/A(k,k) 
end do 
return 

end subroutine gauss 

The statement use prec_mod tells the compiler to load the module prec_mod 
(not shown here), which defines the word length DP for floating-point numbers. Also 
note the use of array sections, such asa(k,k+l:n),a feature that was not available 
in previous versions of FORTRAN. 

The equivalent MATLAB function is (MATLAB does not have subroutines): 

function b = gauss(A,b) 
n = length(b); 

%-Elimination phase- 

for k = l:n-l 

for i = k+l:n 
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if A(i,k) "= 0 

lambda = A(i,k)/A(k,k); 

A(i,k+l:n) = A(i,k+l:n) - lambda*A(k,k+1:n); 
b(i)= b(i) - lambda*b(k); 

end 

end 

end 

% -Back substitution phase- 

for k = n:-1:1 

b(k) = (b(k) - A(k,k+1:n)*b(k+l:n))/A(k,k); 

end 


Simultaneous equations can also be solved in MATLAB with the simple command 
A\b (see below). 

MATLAB can be operated in the interactive mode through its command window, 
where each command is executed immediately upon its entry. In this mode MATLAB 
acts like an electronic calculator. Here is an example of an interactive session for the 
solution of simultaneous equations: 

» A = [210; -122; 014]; % Input 3x3 matrix 
» b = [1; 2; 3]; % Input column vector 

» soln = A\b % Solve A*x = b by left division 

soln = 

0.2500 

0.5000 

0.6250 

The symbol >> is MATLAB’s prompt for input. The percent sign (%') marks the 
beginning of a comment. A semicolon (;) has two functions: it suppresses printout 
of intermediate results and separates the rows of a matrix. Without a terminating 
semicolon, the result of a command would be displayed. For example, omission of 
the last semicolon in the line defining the matrix A would result in 

» A = [2 1 0; -122; 014] 

A = 

2 10 
-12 2 
0 14 
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Functions and programs can be created with the MATLAB editor/debugger and 
saved with the . m extension (MATLAB calls them M-files). The file name of a saved 
function should be identical to the name of the function. For example, if the function 
for Gauss elimination listed above is saved as gauss. m, it can be called just like any 
MATLAB function: 

» A = [2 1 0; -1 2 2; 0 1 4] ; 

» b = [1; 2; 3]; 

» soln = gauss(A,b) 
soln = 

0.2500 

0.5000 

0.6250 


1.2 Data Types and Variables 

Data Types 

The most commonly used MATLAB data types, or classes, are double, char and 
logical, all of which are considered by MATLAB as arrays. Numerical objects 
belong to the class double, which represents double-precision arrays; a scalar is 
treated as a 1 x 1 array. The elements of a char type array are strings (sequences 
of characters), whereas a logical type array element may contain only 1 (true) or 0 
(false). 

Another important class is function_handle, which is unique to MATLAB. It 
contains information required to find and execute a function. The name of a function 
handle consists of the character @, followed by the name of the function; e.g., @sin. 
Function handles are used as input arguments in function calls. For example, suppose 
that we have a MATLAB function plot (func, xl, x2) that plots any user-specified 
function func from xl to x2. The function call to plot sin x from 0 to n would be 
plot(@sin,0,pi). 

There are other data types, but we seldom come across them in this text. Additional 
classes can be defined by the user. The class of an object can be displayed with the 
class command. For example, 


» x = 1 + 3i % Complex number 
» class(x) 


ans = 
double 
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Variables 

Variable names, which must start with a letter, are case sensitive. Hence xstart and 
xStart represent two different variables. The length of the name is unlimited, but 
only the first N characters are significant. To find N for your installation of MATLAB, 
use the command namelengthmax: 

» namelengthmax 
ans = 

63 

Variables that are defined within a MATLAB function are local in their scope. 
They are not available to other parts of the program and do not remain in memory 
after exiting the function (this applies to most programming languages). However, 
variables can be shared between a function and the calling program if they are declared 
global. For example, by placing the statement global X Y in a function as well as 
the calling program, the variables X and Y are shared between the two program units. 
The recommended practice is to use capital letters for global variables. 

MATLAB contains several built-in constants and special variables, most important 
of which are 


ans 

Default name for results 

eps 

Smallest number for which l + eps > l 

inf 

Infinity 

NaN 

Not a number 

i or j 


Pi 

IX 

realmin 

Smallest usable positive number 

realmax 

Largest usable positive number 


Here are a few of examples: 

» warning off % Suppresses print of warning messages 
» 5/0 
ans = 

Inf 


» 0/0 
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ans = 

NaN 

» 5*NaN % Most operations with NaN result in NaN 

ans = 

NaN 

» NaN == NaN % Different NaN’s are not equal! 
ans = 

0 

» eps 
ans = 

2.2204e-016 

Arrays 

Arrays can be created in several ways. One of them is to type the elements of the array 
between brackets. The elements in each row must be separated by blanks or commas. 
Here is an example of generating a 3 x 3 matrix: 

» A = [ 2 -1 0 

-1 2 -1 
0 -1 1] 

A = 

2-10 
-1 2 -1 

0-11 

The elements can also be typed on a single line, separating the rows with semi¬ 
colons: 

» A = [2 -1 0; -1 2 -1; 0 -1 1] 

A = 

2-10 
-1 2 -1 

0-11 

Unlike most computer languages, MATLAB differentiates between row and col¬ 
umn vectors (this peculiarity is a frequent source of programming and input errors). 
For example, 
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»b=[123] % Row vector 

b = 

12 3 

» b = [1; 2; 3] % Column vector 

b = 

1 

2 

3 

» b = [1 2 3]’ % Transpose of row vector 

b = 

1 

2 

3 

The single quote (’) is the transpose operator in MATLAB; thus b ’ is the transpose 

of b. 

The elements of a matrix, such as 


A = 


An Ai2 Ai3 

A21 A22 A23 

A31 A32 A33 


can be accessed with the statement A( i, j), where i and j are the row and column 

numbers, respectively. A section of an array can be extracted by the use of colon 
notation. Here is an illustration: 


» A = [8 1 6; 3 5 7; 4 9 2] 
A = 

8 16 

3 5 7 

4 9 2 


» A(2,3) % Element in row 2, column 3 

ans = 

7 


» A(:,2) 


% Second column 
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ans = 

1 

5 

9 

» A(2:3,2:3) % The 2x2 submatrix in lower right corner 

ans = 

5 7 

9 2 

Array elements can also be accessed with a single index. Thus A(i) extracts the 
ith element of A, counting the elements down the columns. For example, A( 7) and 
A( l, 3) would extract the same element from a 3 x 3 matrix. 

Cells 

A cell array is a sequence of arbitrary objects. Cell arrays can be created by enclosing 
their contents between braces {}. For example, a cell array c consisting of three cells 
can be created by 

» c = {[1 2 3], ’one two three’, 6 + 7i} 
c = 

[1x3 double] 'one two three’ [6.0000+ 7.00001] 

As seen above, the contents of some cells are not printed in order to save space. 
If all contents are to be displayed, use the celldisp command: 

» celldisp(c) 
c{l> = 

12 3 

c{2> = 

one two three 
c{ 3} = 

6.0000 + 7.00001 

Braces are also used to extract the contents of the cells: 

» c{l> % First cell 

ans = 


1 


2 


3 
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» c{l}(2) % Second element of first cell 

ans = 

2 

» c{2} % Second cell 

ans = 

one two three 


Strings 

A string is a sequence of characters; it is treated by MATLAB as a character array. Strings 
are created by enclosing the characters between single quotes. They are concatenated 
with the function strcat, whereas a colon operator (:) is used to extract a portion of 
the string. For example, 


» si = 'Press return to exit’; 
» s2 = ’ the program’; 

» s3 = strcat(si,s2) 
s3 = 

Press return to exit the program 
» s4 = sl(l:12) 
s4 = 

Press return 


% Create a string 
% Create another string 
% Concatenate si and s2 

% Extract chars. 1-12 of si 


1.3 Operators 

Arithmetic Operators 

MATLAB supports the usual arithmetic operators: 


+ 

Addition 

- 

Subtraction 

* 

Multiplication 

- 

Exponentiation 


When applied to matrices, they perform the familiar matrix operations, as illus¬ 
trated below. 

» A = [123; 456]; B= [789; 012]; 


» A + B 


% Matrix addition 









10 


Introduction to MATLAB 


8 10 12 

4 6 8 

» A*B’ % Matrix multiplication 

ans = 

50 8 

122 17 


» A*B % Matrix multiplication fails 

??? Error using ==> * % due to incompatible dimensions 

Inner matrix dimensions must agree. 


There are two division operators in MATLAB: 


/ 

Right division 

\ 

Left division 


If a and b are scalars, the right division a/b results in a divided by b, whereas the left 
division is equivalent to b/a. In the case where A and B are matrices, A/B returns the 
solution of x*A = B and A\B yields the solution of A*x = B. 

Often we need to apply the *, / and ~ operations to matrices in an element-by- 
element fashion. This can be done by preceding the operator with a period (.) as 
follows: 


. * 

Element-wise multiplication 

./ 

Element-wise division 


Element-wise exponentiation 


For example, the computation Cy = Ay By can be accomplished with 

» A = [1 2 3; 4 5 6]; B = [7 8 9; 0 1 2]; 

» C = A.*B 
C = 

7 16 27 

0 5 12 
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Comparison Operators 

The comparison (relational) operators return 1 for true and 0 for false. These operators 
are 


< 

Less than 

> 

Greater than 

<= 

Less than or equal to 

> = 

Greater than or equal to 

== 

Equal to 

"= 

Not equal to 


The comparison operators always act element-wise on matrices; hence they result in 
a matrix of logical type. For example, 

» A = [123; 456]; B = [789; 012]; 

» A > B 
ans = 

0 0 0 
111 

Logical Operators 

The logical operators in MATLAB are 


& 

AND 

1 

OR 

- 

NOT 


They are used to build compound relational expressions, an example of which is 
shown below. 

» A = [123; 456]; B= [789; 012]; 

» (A > B) | (B > 5) 
ans = 

111 

111 
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1.4 Flow Control 

Conditionals 

if, else, elseif 
The if construct 


if condition 
block 

end 


executes the block of statements if the condition is true. If the condition is false, 
the block skipped. The if conditional can be followed by any number of elseif 
constructs: 


if condition 
block 

elseif condition 
block 

end 


which work in the same manner. The else clause 


else 

block 

end 


can be used to define the block of statements which are to be executed if none of 
the if-elseif clauses are true. The function signum below illustrates the use of the 
conditionals. 

function sgn = signum(a) 
if a > 0 

sgn = 1; 
elseif a < 0 
sgn = -1; 


else 
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sgn = 0; 

end 

» signum (-1.5) 
ans = 

-1 

switch 

The switch construct is 


switch expression 
case valuel 
block 

case value2 
block 

otherwise 

block 

end 

Here the expression is evaluated and the control is passed to the case that matches the 
value. For instance, if the value of expression is equal to value2, the block of statements 
following case value2 is executed. If the value of expression does not match any 
of the case values, the control passes to the optional otherwise block. Here is an 
example: 

function y = trig(func,x) 
switch func 

case ’sin’ 

y = sin(x); 
case ’cos’ 

y = cos(x); 
case ’tan’ 

y = tan(x); 
otherwise 

error(’No such function defined’) 

end 

» trig(’tan’,pi/3) 
ans = 


1.7321 
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Loops 

while 

The while construct 


while condition-, 
block 

end 

executes a block of statements if the condition is true. After execution of the block, 
condition is evaluated again. If it is still true, the block is executed again. This process 
is continued until the condition becomes false. 

The following example computes the number of years it takes for a $ 1000 principal 
to grow to $10,000 at 6% annual interest. 

» p = 1000; years = 0; 

» while p < 10000 

years = years + 1; 
p = p*(l + 0.06) ; 

end 

» years 
years = 

40 


for 

The for loop requires a target and a sequence over which the target loops. The form 
of the construct is 

for target = sequence 
block 

end 

For example, to compute cos x from x = 0 to n /2 at increments of 7r/10 we could 
use 

» for n = 0:5 % n loops over the sequence 012345 

y(n+l) = cos(n*pi/10); 

end 
» Y 
Y = 

1.0000 0.9511 0.8090 0.5878 0.3090 0.0000 
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Loops should be avoided whenever possible in favor of vectorized expressions, 
which execute much faster. A vectorized solution to the last computation would be 


» n = 0:5; 

» y = cos(n*pi/10) 

y = 

1.0000 0.9511 0.8090 0.5878 0.3090 0.0000 

break 

Any loop can be terminated by the break statement. Upon encountering a break 
statement, the control is passed to the first statement outside the loop. In the fol¬ 
lowing example the function buildvec constructs a row vector of arbitrary length 
by prompting for its elements. The process is terminated when an empty element is 
encountered. 

function x = buildvec 
for i = 1:1000 

elem = input(’==> ’); % Prompts for input of element 
if isempty(elem) % Check for empty element 

break 

end 

x(i) = elem; 

end 

» x = buildvec 

==> 3 

==> 5 

==> 7 

==> 2 

x = 

3 5 7 2 

continue 

When the continue statement is encountered in a loop, the control is passed to 
the next iteration without executing the statements in the current iteration. As an 
illustration, consider the following function that strips all the blanks from the string s l: 


function s2 = strip(sl) 
s2 = ’ ’ ; 

for i = l:length(sl) 


% Create an empty string 
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if sl(i) == ’ ' 

continue 

else 

s2 = strcat(s2,sl(i)); % Concatenation 

end 

end 

» s2 = strip('This is too bad’) 
s2 = 

Thisistoobad 

return 

A function normally returns to the calling program when it runs out of statements. 
However, the function can be forced to exit with the return command. In the ex¬ 
ample below, the function solve uses the Newton-Raphson method to find the zero 
of fix) = sin .r — 0.5x. The input x (guess of the solution) is refined in successive 
iterations using the formula x <- x + Ax, where Ax = — fix) //' (x), until the change 
Ax becomes sufficiently small. The procedure is then terminated with the return 
statement. The for loop assures that the number of iterations does not exceed 30, 
which should be more than enough for convergence. 

function x = solve(x) 
for numlter = 1:30 

dx = -(sin(x) - 0.5*x)/(cos(x) - 0.5); % -f(x)/f’(x) 
x = x + dx; 

if abs(dx) < 1.0e-6 % Check for convergence 

return 

end 

end 

error(’Too many iterations') 

» x = solve(2) 
x = 

1.8955 


error 

Execution of a program can be terminated and a message displayed with the error 
function 

error( ’ message ’) 

For example, the following program lines determine the dimensions of a matrix and 
aborts the program if the dimensions are not equal. 
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[m,n] = size(A); % m = no. of rows; n = no. of cols, 
if m '= n 

error(’Matrix must be square’) 

end 


1.5 Functions 

Function Definition 

The body of a function must be preceded by the function definition line 


function [ outputMrgs] =function name{inpul arguments) 


The input and output arguments must be separated by commas. The number of 
arguments may be zero. If there is only one output argument, the enclosing brackets 
maybe omitted. 

To make the function accessible to other programs units, it must be saved under 
the file name function .name . m. This file may contain other functions, called subfunc¬ 
tions. The subfunctions can be called only by the primary function function .name or 
other subfunctions in the file; they are not accessible to other program units. 

Calling Functions 

A function may be called with fewer arguments than appear in the function defini¬ 
tion. The number of input and output arguments used in the function call can be 
determined by the functions nargin and nargout, respectively. The following exam¬ 
ple shows a modified version of the function solve that involves two input and two 
output arguments. The error tolerance epsilon is an optional input that maybe used 
to override the default value l . Oe-6. The output argument numl ter, which contains 
the number of iterations, may also be omitted from the function call. 

function [x.numlter] = solve(x,epsilon) 
if nargin == 1 % Specify default value if 

epsilon = 1.0e-6; % second input argument is 

end % omitted in function call 

for numlter = 1:100 

dx = -(sin(x) - 0.5*x)/(cos(x) - 0.5); 
x = x + dx; 

if abs(dx) < epsilon % Converged; return to 

return % calling program 


end 
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end 

error(’Too many iterations') 


» x = solve(2) 


% numlter not printed 


x 


1.8955 


» [x,numlter] = solve(2) % numlter is printed 
x = 

1.8955 
numlter = 

4 

» format long 

» x = solve(2,1.Oe-12) % Solving with extra precision 


x 


1.89549426703398 


» 


Evaluating Functions 

Let us consider a slightly different version of the function solve shown below. The 
expression for dx, namely Ax = —/(x)//'(x), is now coded in the function myfunc, 
so that solve contains a call to myfunc. This will work fine, provided that myfunc is 
stored under the file name myfunc. m so that MATLAB can find it. 

function [x,numlter] = solve(x,epsilon) 
if nargin == 1; epsilon = 1.0e-6; end 
for numlter = 1:30 
dx = myfunc(x); 
x = x + dx; 

if abs(dx) < epsilon; return; end 

end 

error(’Too many iterations’) 

function y = myfunc(x) 
y = -(sin(x) - 0.5*x)/(cos(x) - 0.5); 

» x = solve(2) 


x 


1.8955 
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In the above version of solve the function returning dx is stuck with the name 
myfunc. Ifmyfunc is replaced with another function name, solve will not work unless 
the corresponding change is made in its code. In general, it is not a good idea to alter 
computer code that has been tested and debugged; all data should be communicated 
to a function through its arguments. MATLAB makes this possible by passing the 
function handle of myfunc to solve as an argument, as illustrated below. 

function [x.numlter] = solve(func,x,epsilon) 
if nargin == 2; epsilon = 1.0e-6; end 
for numlter = 1:30 

dx = feval(func,x); % feval is a MATLAB function for 

x = x + dx; % evaluating a passed function 

if abs(dx) < epsilon; return; end 

end 

error('Too many iterations’) 

» x = solve(©myfunc,2) % ©myfunc is the function handle 

x = 

1.8955 

The call solve (©myfunc , 2) creates a function handle to myfunc and passes it 
to solve as an argument. Hence the variable func in solve contains the handle 
to myfunc. A function passed to another function by its handle is evaluated by the 
MATLAB function 

feval ( functionJiandle , arguments') 

It is now possible to use solve to find a zero of any /( x) by coding the function 
Ax = —/(*)//'( x) and passing its handle to solve. 

In-Line Functions 

If the function is not overly complicated, it can also be represented as an inline 
object: 

function_name = inline {’expression ’,’varl \’var2 ’,...) 

where expression specifies the function and varl, var2,.. . are the names of the inde¬ 
pendent variables. Here is an example: 

» myfunc = inline (’x"2 + y'2’,’x’,’y’); 

» myfunc (3,5) 
ans = 


34 
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The advantage of an in-line function is that it can be embedded in the body of 
the code; it does not have to reside in an M-file. 


1.6 Input/Output 

Reading Input 

The MATLAB function for receiving user input is 


value = input (’ prompt ’) 

It displays a prompt and then waits for input. If the input is an expression, it is evalu¬ 
ated and returned in value. The following two samples illustrate the use of input: 

» a = input('Enter expression: ’) 

Enter expression: tan(0.15) 
a = 

0.1511 


» s = input(’Enter string: ’) 
Enter string: 'Black sheep’ 
s = 

Black sheep 


Printing Output 

As mentioned before, the result of a statement is printed if the statement does not end 
with a semicolon. This is the easiest way of displaying results in MATLAB. Normally 
MATLAB displays numerical results with about five digits, but this can be changed 
with the format command: 


format long 

switches to 16-digit display 

format short 

switches to 5-digit display 


To print formatted output, use the fprintf function: 


fprintf( ’format' , list ) 

where format contains formatting specifications and list is the list of items to be 
printed, separated by commas. Typically used formatting specifications are 
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%w.df 

Floating point notation 

%w.de 

Exponential notation 

\n 

Newline character 


where w is the width of the field and d is the number of digits after the decimal point. 
Line break is forced by the newline character. The following example prints a formatted 
table of sin x vs. x at intervals of 0.2: 

» x = 0:0.2:1; 

» for i = l:length(x) 

fprintf(’%4.If %11.6f\n’,x(i),sin(x(i))) 

end 


O 

O 

0.000000 

0.2 

0.198669 

0.4 

0.389418 

0.6 

0.564642 

00 

o 

0.717356 

1.0 

0.841471 


1.7 Array Manipulation 

Creating Arrays 

We learned before that an array can be created by typing its elements between brackets: 

» x = [0 0.25 0.5 0.75 1] 
x = 

0 0.2500 0.5000 0.7500 1.0000 

Colon Operator 

Arrays with equally spaced elements can also be constructed with the colon operator. 

x = firstselem: increment : lastslem 

For example, 

» x = 0:0.25:1 
x = 

0 0.2500 0.5000 0.7500 1.0000 
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linspace 

Another means of creating an array with equally spaced elements is the linspace 
function. The statement 

x = linspace (xfirst ,xlast, n} 

creates an array of n elements starting with xfirst and ending with xlast. Here is an 
illustration: 

» x = linspace(0,1,5) 
x = 

0 0.2500 0.5000 0.7500 1.0000 


logspace 

The function logspace is the logarithmic counterpart of linspace. The call 

x = logspace (zfirst ,zlast, n') 

creates n logarithmically spaced elements starting with x = 10~ f' rst and ending with 
x = 1 0 Zlast . Here is an example: 

» x = logspace(0,1,5) 
x = 

1.0000 1.7783 3.1623 5.6234 10.0000 

zeros 

The function call 

X = zeros (m,n) 

returns a matrix of m rows and n columns that is filled with zeroes. When the fun- 
ctionis called with a single argument, e.g., zeros(n), a n x n matrix is created. 

ones 

X = ones im,n') 

The function ones works in the manner as zeros, but fills the matrix with ones. 

rand 

X = rand(»Z , n~) 


This function returns a matrix filled with random numbers between 0 and 1. 
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eye 

The function eye 


X = eye(n) 


creates an n x n identity matrix. 

Array Functions 

There are numerous array functions in MATLAB that perform matrix operations and 
other useful tasks. Here are a few basic functions: 

length 

The length n (number of elements) of a vector x can be determined with the function 
length: 


n =length(X) 


size 

If the function size is called with a single input argument: 


[m,n] = size(A) 


it determines the number of rows m and number of columns n in the matrix X. If 
called with two input arguments: 


m = size (X, dim') 


it returns the length of X in the specified dimension (dim = l yields the number of 
rows, and dim = 2 gives the number of columns). 

reshape 

The reshape function is used to rearrange the elements of a matrix. The call 


Y = reshape(X, m ,U) 

returns a mxn matrix the elements of which are taken from matrix X in the column¬ 
wise order. The total number of elements in X must be equal to m x n. Here is an 
example: 
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» a = 1:2:11 


a 


1 3 5 7 9 11 

» A = reshape(a,2,3) 


A 


1 


5 


9 


3 


7 


11 


dot 


a = dot(x,y) 


This function returns the dot product of two vectors x and y which must be of the 
same length. 

prod 


a= prod(X) 


For a vector x, prod (x) returns the product of its elements. If x is a matrix, then a is a 
row vector containing the products over each column. For example, 

» a = [1 2 3 4 5 6]; 

» A = reshape(a,2,3) 

A = 

13 5 

2 4 6 

» prod(a) 
ans = 

720 

» prod(A) 
ans = 

2 12 30 

sum 


a = sum(X) 


This function is similar to prod, except that it returns the sum of the elements. 
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cross 


C= cross(a,fo) 

The function cross computes the cross product: c = ax b, where vectors a and b 
must be of length 3. 


1.8 Writing and Running Programs 

MATLAB has two windows available for typing program lines: the command window 
and the editor/debugger. The command window is always in the interactive mode, so 
that any statement entered into the window is immediately processed. The interactive 
mode is a good way to experiment with the language and try out programming ideas. 

MATLAB opens the editor window when a new M-file is created, or an existing file 
is opened. The editor window is used to type and save programs (called script flies in 
MATLAB) and functions. One could also use a text editor to enter program lines, but 
the MATLAB editor has MATLAB-specific features, such as color coding and automatic 
indentation, that make work easier. Before a program or function can be executed, it 
must be saved as a MATLAB M-file (recall that these files have the . m extension). A 
program can be run by invoking the run command from the editor’s debug menu. 

When a function is called for the first time during a program run, it is compiled 
into P-code (pseudo-code) to speed up execution in subsequent calls to the function. 
One can also create the P-code of a function and save it on diskby issuing the command 


pcode function_name 


MATLAB will then load the P-code (which has the . p extension) into the memory 
rather than the text file. 

The variables created during a MATLAB session are saved in the MATLAB 
workspace until they are cleared. Listing of the saved variables can be displayed by the 
command who. If greater detail about the variables is required, type whos. Variables 
can be cleared from the workspace with the command 


clear ab ... 


which clears the variables a, b, _If the list of variables is omitted, all variables are 

cleared. 
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Assistance on any MATLAB function is available by typing 


help function name 


in the command window. 


1.9 Plotting 


MATLAB has extensive plotting capabilities. Here we illustrate some basic commands 
for two-dimensional plots. The example below plots sin x and cos x on the same plot. 


» x = 0:0.2:pi; 

» y = sin(x); 

» plot(x,y,’k:o’) 

» hold on 
» z = cos(x) ; 

» plot(x,z,’k:x’) 
» grid on 
» xlabelC’x’) 

» ylabelC’y’) 

» gtext(’sin x’) 
» gtext(’cos x’) 


% Create x-array 
% Create y-array 

% Plot x-y points with specified color 
% and symbol C’k’ = black, 'o’ = circles) 
% Allow overwriting of current plot 
% Create z-array 

% Plot x-z points (’x’ = crosses) 

% Display coordinate grid 
% Display label for x-axis 
% Display label for y-axis 
% Create mouse-movable text 





















27 


1.9 Plotting 


A function stored in a M-file can be plotted with a single command, as shown 
below. 

function y = testfunc(x) % Stored function 

y = (x. "3) . *sin(x) - l./x; 

» fplot(Otestfunc,[1 20]) % Plot from x = 1 to 20 

» grid on 



The plots appearing in this book from here on were not produced by MATLAB. 
We used the copy/paste operation to transfer the numerical data to a spreadsheet 
and then let the spreadsheet create the plot. This resulted in plots more suited for 
publication. 
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Solve the simultaneous equations Ax = b 


2.1 Introduction 

In this chapter we look at the solution of n linear, algebraic equations in n unknowns. 
It is by far the longest and arguably the most important topic in the book. There 
is a good reason for this—it is almost impossible to carry out numerical analysis 
of any sort without encountering simultaneous equations. Moreover, equation sets 
arising from physical problems are often very large, consuming a lot of computa¬ 
tional resources. It usually possible to reduce the storage requirements and the run 
time by exploiting special properties of the coefficient matrix, such as sparseness 
(most elements of a sparse matrix are zero). Hence there are many algorithms ded¬ 
icated to the solution of large sets of equations, each one being tailored to a partic¬ 
ular form of the coefficient matrix (symmetric, banded, sparse, etc.). A well-known 
collection of these routines is LAPACK - Linear Algebra PACKage, originally written in 
] ; orLran77'. 

We cannot possibly discuss all the special algorithms in the limited space avail¬ 
able. The best we can do is to present the basic methods of solution, supplemented 
by a few useful algorithms for banded and sparse coefficient matrices. 

Notation 

A system of algebraic equations has the form 


1 LAPACK is the successor of LINPACK, a 1970s and 80s collection of Fortran subroutines. 
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AuXi + ^ 12^2 + • • • + A\ n x n = b\ 

A 21 X 1 + ^ 22^2 + ' ' ' + A2 n Xn = i>2 

A 31 X 1 + A 32 X 2 + ' ' ' + A^ n X n = £>3 ( 2 . 1 ) 


AnlXi + A, t 2X2 + • • • + A nn^n — 


where the coefficients Ay and the constants bj are known, and X; represent the un¬ 
knowns. In matrix notation the equations are written as 


A] 1 
A 21 

_A n 1 

or, simply 


Al 2 ' 

A\ n 


Xl~ 


'for 

A22 • 

Ain 


x 2 

= 

b 2 

A n 2 ■ 

' A nn _ 


_ %n _ 


K_ 


Ax = b 


( 2 . 2 ) 


(2.3) 


A particularly useful representation of the equations for computational purposes 
is the augmented coefficient matrix, obtained by adjoining the constant vector b to 
the coefficient matrix A in the following fashion: 



'An 

A \2 ■ 

Ain 

h 

M — 

A21 

A22 • 

Ain 

b 2 


_A„i 

An 2 ■ 

■ Ann 

K_ 


Uniqueness of Solution 

A system of n linear equations in n unknowns has a unique solution, provided that 
the determinant of the coefficient matrix is nonsingular, i.e., if |A| / 0. The rows and 
columns of a nonsingular matrix are linearly independent in the sense that no row (or 
column) is a linear combination of other rows (or columns). 

If the coefficient matrix is singular, the equations may have an infinite number of 
solutions, or no solutions at all, depending on the constant vector. As an illustration, 
take the equations 


2x + y = 3 4x+2y = 6 

Since the second equation can be obtained by multiplying the first equation by two, 
any combination of x and y that satisfies the first equation is also a solution of the 
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second equation. The number of such combinations is infinite. On the other hand, 
the equations 


2x + y = 3 4x + 2y = 0 

have no solution because the second equation, being equivalent to 2x + y = 0, con¬ 
tradicts the first one. Therefore, any solution that satisfies one equation cannot satisfy 
the other one. 

Ill-Conditioning 

An obvious question is: what happens when the coefficient matrix is almost singular; 
i. e., if | A] is very small? In order to determine whether the determinant of the coefficient 
matrix is “small,” we need a reference against which the determinant can be measured. 
This reference is called the norm of the matrix, denoted by || A||. We can then say that 
the determinant is small if 


|A| « ||A|| 

Several norms of a matrix have been defined in existing literature, such as 

n n n 

II A|| = l|A|| = maxJ]|Ay| (2.5a) 

\J 2=1 7=1 1 --" 7=1 

A formal measure of conditioning is the matrix condition number, defined as 

cond(A) = ||A|| ||A -1 1 (2.5b) 

If this number is close to unity, the matrix is well-conditioned. The condition number 
increases with the degree of ill-conditioning, reaching infinity for a singular matrix. 
Note that the condition number is not unique, but depends on the choice of the matrix 
norm. Unfortunately, the condition number is expensive to compute for large matri¬ 
ces. In most cases it is sufficient to gauge conditioning by comparing the determinant 
with the magnitudes of the elements in the matrix. 

If the equations are ill-conditioned, small changes in the coefficient matrix result 
in large changes in the solution. As an illustration, consider the equations 

2x+y = 3 2x+ l.OOly = 0 

that have the solution x = 1501.5, y = —3000. Since |A| = 2(1.001) — 2(1) = 0.002 is 
much smaller than the coefficients, the equations are ill-conditioned. The effect of 
ill-conditioning can be verified by changing the second equation to 2x + 1.002y = 0 
and re-solving the equations. The result is x = 751.5, y = —1500. Note that a 0.1% 
change in the coefficient of y produced a 100% change in the solution. 
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Numerical solutions of ill-conditioned equations are not to be trusted. The reason 
is that the inevitable roundoff errors during the solution process are equivalent to in¬ 
troducing small changes into the coefficient matrix. This in turn introduces large errors 
into the solution, the magnitude of which depends on the severity of ill-conditioning. 
In suspect cases the determinant of the coefficient matrix should be computed so that 
the degree of ill-conditioning can be estimated. This can be done during or after the 
solution with only a small computational effort. 

Linear Systems 

Linear, algebraic equations occur in almost all branches of numerical analysis. But 
their most visible application in engineering is in the analysis of linear systems (any 
system whose response is proportional to the input is deemed to be linear). Linear 
systems include structures, elastic solids, heat flow, seepage of fluids, electromagnetic 
fields and electric circuits; i.e., most topics taught in an engineering curriculum. 

If the system is discrete, such as a truss or an electric circuit, then its analysis 
leads directly to linear algebraic equations. In the case of a statically determinate 
truss, for example, the equations arise when the equilibrium conditions of the joints 
are written down. The unknowns X\, X 2 ,. .., x n represent the forces in the members 
and the support reactions, and the constants b \, b 2 ,..., b n are the prescribed external 
loads. 

The behavior of continuous systems is described by differential equations, rather 
than algebraic equations. However, because numerical analysis can deal only with 
discrete variables, it is first necessary to approximate a differential equation with a 
system of algebraic equations. The well-known finite difference, finite element and 
boundary element methods of analysis work in this manner. They use different ap¬ 
proximations to achieve the “discretization,” but in each case the final task is the same: 
solve a system (often a very large system) of linear, algebraic equations. 

In summary, the modeling of linear systems invariably gives rise to equations of 
the form Ax = b, where b is the input and x represents the response of the system. 
The coefficient matrix A, which reflects the characteristics of the system, is inde¬ 
pendent of the input. In other words, if the input is changed, the equations have to 
be solved again with a different b, but the same A. Therefore, it is desirable to have 
an equation-solving algorithm that can handle any number of constant vectors with 
minimal computational effort. 

Methods of Solution 

There are two classes of methods for solving systems of linear, algebraic equations: 
direct and iterative methods. The common characteristic of direct methods is that they 
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transform the original equations into equivalent equations (equations that have the 
same solution) that can be solved more easily. The transformation is carried out by 
applying the three operations listed below. These so-called elementary operations do 
not change the solution, but they may affect the determinant of the coefficient matrix 
as indicated in parentheses. 

• Exchanging two equations (changes sign of |A|). 

• Multiplying an equation by a nonzero constant (multiplies |A| by the same 
constant). 

• Multiplying an equation by a nonzero constant and then subtracting it from an¬ 
other equation (leaves |A| unchanged). 

Iterative, or indirect methods, start with a guess of the solution x, and then re¬ 
peatedly refine the solution until a certain convergence criterion is reached. Iterative 
methods are generally less efficient than their direct counterparts due to the large 
number of iterations required. But they do have significant computational advan¬ 
tages if the coefficient matrix is very large and sparsely populated (most coefficients 
are zero). 

Overview of Direct Methods 

Table 2.1 lists three popular direct methods, each of which uses elementary operations 
to produce its own final form of easy-to-solve equations. 


Method 

Initial form 

Final form 

Gauss elimination 

Ax = b 

Ux = c 

LU decomposition 

Ax = b 

LUx = b 

Gauss-Jordan elimination 

Ax = b 

lx = c 


Table 2.1 


In the above table U represents an upper triangular matrix, L is a lower triangular 
matrix and I denotes the identity matrix. A square matrix is called triangular if it 
contains only zero elements on one side of the leading diagonal. Thus a 3 x 3 upper 
triangular matrix has the form 


U = 


U u 

0 

0 


U\2 U\3 

U'22 U 23 

0 U33 
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and a 3 x 3 lower triangular matrix appears as 


L = 


L n 0 
£21 L22 

£31 £32 


0 

0 

£33 


Triangular matrices play an important role in linear algebra, since they simplify 
many computations. For example, consider the equations Lx = c, or 


£11^1 = Ci 
£21-£l + L22X2 = C2 
£31*1 + L32X2 + L33X3 = C3 


If we solve the equations forward, starting with the first equation, the computations 
are very easy, since each equation would contain only one unknown at a time. The 
solution would thus proceed as follows: 


Xi = Ci/Lu 

X 2 = (C2 — L 2 \X\)/L 22 

*3 = (C3 — L 31 X 1 — L 32 X 2 ) / £33 


Thisprocedure is known asforwardsubstitution.lnasimilarway,Ux = c, encountered 
in Gauss elimination, can easily be solved by back substitution, which starts with the 
last equation and proceeds backward through the equations. 

The equations LUx = b, which are associated with LU decomposition, can also 
be solved quickly if we replace them with two sets of equivalent equations: Ly = b 
and Ux = y. Now Ly = b can be solved for y by forward substitution, followed by the 
solution of Ux = y by means of back substitution. 

The equations lx = c, which are produced by Gauss-Jordan elimination, are 
equivalent to x = c (recall the identity lx = x), so that c is already the solution. 

EXAMPLE 2.1 

Determine whether the following matrix is singular: 


2.1 

-0.6 

1.1 

3.2 

4.7 

-0.8 

3.1 

-6.5 

4.1 
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Solution Laplace’s development (see Appendix A2) of the determinant about the first 
row of A yields 


|A| =2.1 


4.7 

-6.5 


- 0.8 

4.1 


+ 0.6 


3.2 

3.1 


- 0.8 

4.1 


+ 1.1 


3.2 

3.1 


4.7 

-6.5 


= 2.1(14.07) + 0.6(15.60) + l.l(-35.37) = 0 


Since the determinant is zero, the matrix is singular. It can be verified that the singu¬ 
larity is due to the following row dependency: (row3) = (3 x row 1) — (row2). 

EXAMPLE 2.2 

Solve the equations Ax = b, where 


8 

-6 

2" 


28~ 

-4 

11 

-7 

b = 

-40 

4 

-7 

6 


33 


knowing that the LU decomposition of the coefficient matrix is (you should verify this) 



1 

O 

O 

CM 

1 _ 


1 

1 

CO 

+- 1 

_ 1 

A = LU = 

-1 20 


0 4-3 


1 -1 1 


— 1 

CM 

O 

O 

_ 1 


Solution We first solve the equations Ly = b by forward substitution: 

2yi = 28 yi = 28/2 = 14 

-yi + 2 y 2 = -40 y 2 = (-40 + y{)/2 = (-40 + 14)/2 = -13 

yi - L 2 + L 3 = 33 y3 = 33 — yi + y 2 = 33 — 14 — 13 = 6 

The solution x is then obtained from Ux = y by back substitution: 

2X3 = T 3 X 3 = JZ 3/2 = 6/2 = 3 
4 x 2 - 3x 3 = y 2 x 2 = (y 2 + 3x 3 )/4 = [-13 + 3(3)] (4 = -1 

4xi - 3 x 2 + x 3 = yi Xi = (yi + 3x 2 - x 3 )/4 = [14 + 3(-l) — 3] /4 = 2 

Hence the solution isx=[2 —1 3 ] r 


2.2 Gauss Elimination Method 
Introduction 

Gauss elimination is the most familiar method for solving simultaneous equations. It 
consists of two parts: the elimination phase and the solution phase. As indicated in 
Table 2.1, the function of the elimination phase is to transform the equations into the 
form Ux = c. The equations are then solved by back substitution. In order to illustrate 
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the procedure, let us solve the equations 


4*1 — 2*2 +x 3 =ll 

(a) 

2*i + 4*2 — 2*3 = —16 

(b) 

*i — 2*2 + 4*3 = 17 

(c) 


Elimination phase The elimination phase utilizes only one of the elementary op¬ 
erations listed in Table 2.1—multiplying one equation (say, equation /) by a constant 
X and subtracting it from another equation (equation i ). The symbolic representation 
of this operation is 


Eq. (z) <- Eq. (i) - X x Eq. ( j) (2.6) 

The equation being subtracted, namely Eq. (j), is called the pivot equation. 

We start the elimination by taking Eq. (a) to be the pivot equation and choosing 
the multipliers X so as to eliminate *i from Eqs. (b) and (c): 

Eq. (b) <- Eq. (b) - (- 0.5) x Eq. (a) 

Eq. (c) <— Eq. (c) — 0.25 x Eq. (a) 

After this transformation, the equations become 


4*1 — 2*2 + *3 = 11 

(a) 

3*2 — 1-5*3 = —10.5 

(b) 

-1.5*2 + 3.75*3 = 14.25 

(c) 


This completes the first pass. Now we pick (b) as the pivot equation and eliminate x 2 
from (c): 


Eq. (c) 4- Eq. (c) - ( - 0.5) x Eq. (b) 
which yields the equations 


4*i - 2x 2 + x 3 = 11 (a) 

3x 2 - 1.5 x 3 = -10.5 (b) 

3x 3 = 9 (c) 

The elimination phase is now complete. The original equations have been replaced 
by equivalent equations that can be easily solved by back substitution. 

As pointed out before, the augmented coefficient matrix is a more convenient 
instrument for performing the computations. Thus the original equations would be 
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written as 


4 

-2 

1 

if 

-2 

4 

-2 

-16 

1 

-2 

4 

17 


and the equivalent equations produced by the first and the second passes of Gauss 
elimination would appear as 


"4 

-2 

1 

11.00“ 

0 

3 

-1.5 

-10.50 

0 

-1.5 

3.75 

14.25 


“4 

-2 

1 

11.0“ 

0 

3 

-1.5 

-10.5 

0 

0 

3 

9.0 


It is important to note that the elementary row operation in Eq. (2.6) leaves 
the determinant of the coefficient matrix unchanged. This is rather fortunate, since 
the determinant of a triangular matrix is very easy to compute—it is the product 
of the diagonal elements (you can verify this quite easily). In other words, 

|A| = |U| = U n X U22 X • • • X U nn (2.7) 

Back substitution phase The unknowns can now be computed by back substitu¬ 
tion in the manner described in the previous article. Solving Eqs. (c), (b) and (a) in 
that order, we get 

x 3 = 9/3 = 3 

x 2 = (-10.5 + 1.5 jc 3 )/3 = [-10.5 + 1.5(3)]/3 = -2 
xi = (11 + 2x 2 ~ JC 3 )/4 = [11 + 2(—2) - 3]/4 = 1 


Algorithm for Gauss Elimination Method 

Elimination phase Let us look at the equations at some instant during the elim¬ 
ination phase. Assume that the first k rows of A have already been transformed to 
upper triangular form. Therefore, the current pivot equation is the fcth equation, and 
all the equations below it are still to be transformed. This situation is depicted by the 
augmented coefficient matrix shown below. Note that the components of A are not 
the coefficients of the original equations (except for the first row), since they have 
been altered by the elimination procedure. The same applies to the components of 
the constant vector b. 
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^11 

A\2 

^13 • 

• Aik ■ 

■ A x] ■ 

-Ain 

bi 

0 

A 22 

^23 • 

■ A 2 k ■ 

■ A 2 j ■ 

■ Am 

b 2 

0 

0 

^33 • 

■ A 3 k ■ 

■ A 3j ■ 

■ A 3n 

b 3 

0 

0 

0 • 

■ Akk ■ 

■ A k j ■ 

A-kn 

b k 

0 

0 

0 • 

■ Aik ■ 

■ Aij ■ 

■ A in 

h 

0 

0 

0 • 

■ Ank ■ 

■ A n] ■ 

■ Ann 

b n 


pivot row 

row being 
transformed 


Let the ith row be a typical row below the pivot equation that is to be transformed, 
meaning that the element is to be eliminated. We can achieve this by multiplying 
the pivot row by X = A^/ Am and subtracting it from the ith row. The corresponding 
changes in the ith row are 


Aij <- Aij — XAkj, j = k , k+ 1,..., n (2.8a) 

fy <- bi - Xb k (2.8b) 


To transform the entire coefficient matrix to upper triangular form, k and i in Eqs. (2.8) 
must have the ranges A: = 1,2,... ,n— 1 (chooses the pivot row), i = k+ 1, k+ 2..., n 
(chooses the row to be transformed). The algorithm for the elimination phase now 
almost writes itself: 


for k = l:n-l 

for i= k+1:n 

if A(i,k) "= 0 

lambda = A(i,k)/A(k,k); 

A(i,k+l:n) = A(i,k+l:n) - lambda*A(k,k+1:n); 
b(i)= b(i) - lambda*b(k); 

end 

end 

end 


In order to avoid unnecessary operations, the above algorithm departs slightly 
from Eqs. (2.8) in the following ways: 

• If A^ happens to be zero, the transformation of row i is skipped. 

• The index j in Eq. (2.8a) starts with k+1 rather than k. Therefore, A^ is not re¬ 
placed by zero, but retains its original value. As the solution phase never accesses 
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the lower triangular portion of the coefficient matrix anyway, its contents are 
irrelevant. 


Back substitution phase After Gauss elimination the augmented coefficient ma¬ 
trix has the form 


An 

A\2 

A\3 ■ 

A\n 

bi~ 

0 

A22 

A23 ■ 

■ A2 n 

b 2 

0 

0 

A3 3 • 

• Asn 

b 3 

0 

0 

0 • 

Ann 

foil _ 


The last equation, A nn x n = b n , is solved first, yielding 

x n — b n / A nn (2.9) 


Consider now the stage of back substitution where x n , x n ~i ,..., x k +1 have been 
already been computed (in that order), and we are about to determine x k from the kth 
equation 

AjcicXk T Akk+iX/c-j-i "T • • • T" Ak n x n — b k 

The solution is 

x k =(bk- V AkjXi] ——, k=n— l,n— 2, ...,1 (2.10) 

V jk 1 1 7 A * 

The corresponding algorithm for back substitution is: 


for k = n:-1:1 

b(k) = (b(k) - A(k,k+1:n)*b(k+l:n))/A(k,k); 

end 


■ gauss 

The function gauss combines the elimination and the back substitution phases. Dur¬ 
ing back substitution b is overwritten by the solution vector x, so that b contains the 
solution upon exit. 

function [x,det] = gauss(A,b) 

% Solves A*x = b by Gauss elimination and computes det(A). 

% USAGE: [x,det] = gauss(A,b) 

if size(b,2) >1; b = b’; end % b must be column vector 
n = length(b); 
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for k = l:n-l % Elimination phase 

for i= k+l:n 

if A(i,k) "= 0 

lambda = A(i,k)/A(k,k); 

A(i,k+l:n) = A(i,k+l:n) - lambda*A(k,k+1:n); 
b(i)= b(i) - lambda*b(k); 

end 

end 

end 

if nargout == 2 ; det = prod(diag(A)); end 

for k = n:-l:l % Back substitution phase 

b(k) = (b(k) - A(k,k+l:n)*b(k+l:n))/A(k,k); 

end 
x = b; 

Multiple Sets of Equations 

As mentioned before, it is frequently necessary to solve the equations Ax = b for 
several constant vectors. Let there be m such constant vectors, denoted by 
bi, b 2 ,..., b m and let the corresponding solution vectors be xi, x 2 ,..., x m . We denote 
multiple sets of equations by AX = B, where 

X=[xi x 2 ••• x m ] B = [bi b 2 ■■■ b m j 

are n x m matrices whose columns consist of solution vectors and constant vectors, 
respectively. 

An economical way to handle such equations during the elimination phase is 
to include all m constant vectors in the augmented coefficient matrix, so that they 
are transformed simultaneously with the coefficient matrix. The solutions are then 
obtained by back substitution in the usual manner, one vector at a time. It would quite 
easy to make the corresponding changes in gauss. However, the LU decomposition 
method, described in the next article, is more versatile in handling multiple constant 
vectors. 

EXAMPLE 2.3 

Use Gauss elimination to solve the equations AX = B, where 


6 

-4 

l” 


”—14 

22” 

-4 

6 

-4 

B = 

36 

-18 

1 

-4 

6 


6 

7 
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Solution The augmented coefficient matrix is 


6 

-4 

1 

-14 

22" 

-4 

6 

-4 

36 

-18 

1 

-4 

6 

6 

7 


The elimination phase consists of the following two passes: 

row 2 «— row 2 + (2/3) x row 1 
row 3 <- row 3 — (1/6) x row 1 


‘6 

-4 

1 

-14 

22" 

0 

10/3 

-10/3 

80/3 

-10/3 

0 

-10/3 

35/6 

25/3 

10/3 


and 


row 3 <r- row 3 + row 2 


‘6 

-4 

1 

-14 

22 

0 

10/3 

-10/3 

80/3 

-10/3 

0 

0 

5/2 

35 

0 


In the solution phase, we first compute xi by back substitution: 
35 

X 31 = -= 14 

5/2 

„ 80/3 + (10/3)X 31 80/3+ (10/3)14 oo 

-A21 = -- = - . .. .. - = 22 


10/3 


10/3 


— 14 + 4X21 —X 3 i —14 + 4(22) —14 

Xu = --- = --- = to 


Thus the first solution vector is 


Xl 


[Xn X 21 X 31 ] — |^ 


=10 22 14 


The second solution vector is computed next, also using back substitution: 


X 3 2 = 0 

„ —10/3 + (10/3)X 32 -10/3 + 0 

X 2 2 = - Z~Z - - = - Z~Z - - = — 1 


10/3 


10/3 


„ 22 + 4X 22 — X 32 22 + 4(—1) — 0 

X12 = --- = --- = 3 


x 2 


[x 12 X 22 X 32 ] T = [ 3 -1 o]' 


Therefore, 
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EXAMPLE 2.4 

An n x n Vandermode matrix A is defined by 

Aij = , i = 1, 2,..., n, j = 1, 2,..., n 

where v is a vector. In MATLAB a Vandermode matrix can be generated by the com¬ 
mand vander(v). Use the function gauss to compute the solution of Ax = b, where 
A is the 6x6 Vandermode matrix generated from the vector 

v=[l.O 1.2 1.4 1.6 1.8 2.0 j 1 

and 

b=[o 1 0 1 0 l] 

Also evaluate the accuracy of the solution (Vandermode matrices tend to be ill- 
conditioned). 

Solution We used the program shown below. After constructing A and b, the output 
format was changed to long so that the solution would be printed to 14 decimal 
places. Here are the results: 

% Example 2.4 (Gauss elimination) 

A = vander(l:0.2:2); 
b = [0 1 0 1 0 1] ’ ; 
format long 
[x,det] = gauss(A,b) 
x = 

1.0e+004 * 

0.04166666666701 

-0.31250000000246 

0.92500000000697 

-1.35000000000972 

0.97093333334002 

-0.27510000000181 

det = 

-1.132462079991823e-006 

As the determinant is quite small relative to the elements of A (you may want to 
print A to verify this), we expect detectable roundoff error. Inspection of x leads us to 
suspect that the exact solution is 

x= [ 1250/3 -3125 9250 -13500 29128/3 —2751 ] F 
in which case the numerical solution would be accurate to 9 decimal places. 
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Another way to gauge the accuracy of the solution is to compute Ax and compare 
the result to b: 

» A*x 
ans = 

- 0.00000000000091 

0.99999999999909 

- 0.00000000000819 

0.99999999998272 

- 0.00000000005366 

0.99999999994998 

The result seems to confirm our previous conclusion. 


2.3 LU Decomposition Methods 
Introduction 

It is possible to show that any square matrix A can be expressed as a product of a lower 
triangular matrix L and an upper triangular matrix U: 

A = LU (2.11) 

The process of computing L and U for a given A is known as LU decomposition or 
LU factorization. LU decomposition is not unique (the combinations of L and U for 
a prescribed A are endless), unless certain constraints are placed on L or U. These 
constraints distinguish one type of decomposition from another. Three commonly 
used decompositions are listed in Table 2.2. 


Name 

Constraints 

Doolittle’s decomposition 

L,-j = 1, i = 1,2,..., n 

Crout’s decomposition 

Uu = 1, i = 1,2,..., n 

Choleski’s decomposition 

L = U r 


Table 2.2 


After decomposing A, it is easy to solve the equations Ax = b, as pointed out in 
Art. 2.1. We first rewrite the equations as LUx = b. Upon using the notation Ux = y, 
the equations become 


Ly = b 
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which can be solved for y by forward substitution. Then 

Ux = y 


will yield x by the back substitution process. 

The advantage of LU decomposition over the Gauss elimination method is that 
once A is decomposed, we can solve Ax = b for as many constant vectors b as we 
please. The cost of each additional solution is relatively small, since the forward and 
back substitution operations are much less time consuming than the decomposition 
process. 


Doolittle's Decomposition Method 

Decomposition phase Doolittle’s decomposition is closely related to Gauss elim¬ 
ination. In order to illustrate the relationship, consider a 3 x 3 matrix A and assume 
that there exist triangular matrices 



1 

0 

0" 


"I/ll 

Uu 

U\3 

L = 

L21 

1 

0 

U = 

0 

U 22 

U 23 


L31 

L 32 

1 


0 

0 

U 33 


such that A = LU. After completing the multiplication on the right hand side, we get 


A = 


U n 

U\iL 2 \ 

U11L31 


Uu t/13 

U 12 L 21 + U 22 U 13 L 21 + U 23 

U 12 L 31 + U 22 L 32 U 13 L 31 + U 23 L 32 + U 33 


( 2 . 12 ) 


Let us now apply Gauss elimination to Eq. (2.12). The first pass of the elimina¬ 
tion procedure consists of choosing the first row as the pivot row and applying the 
elementary operations 


The result is 


row 2 <- row 2 — L 21 x row 1 (eliminates A 21 ) 
row 3 <- row 3 — L 31 x row 1 (eliminates A 31 ) 


A' = 


U n 

0 

0 


U\2 U\3 

U 22 U 23 

U 22 L 32 U 23 L 32 + U 33 


In the next pass we take the second row as the pivot row, and utilize the operation 


row 3 <- row 3 — L 32 x row 2 (eliminates A 32 ) 
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ending up with 


A" = U = 


U u 

0 

0 


U 12 Uu 
U22 U23 

0 I/33 


The foregoing illustration reveals two important features of Doolittle’s decompo¬ 
sition: 


• The matrix U is identical to the upper triangular matrix that results from Gauss 
elimination. 

• The off-diagonal elements of L are the pivot equation multipliers used during 
Gauss elimination; that is, Ly is the multiplier that eliminated Ay. 

It is usual practice to store the multipliers in the lower triangular portion of the 
coefficient matrix, replacing the coefficients as they are eliminated (Ly replacing Ay). 
The diagonal elements of L do not have to be stored, since it is understood that each 
of them is unity. The final form of the coefficient matrix would thus be the following 
mixture of L and U: 


[L\U] 


U n 

U 12 

Ui 3 

L21 

U 22 

U 2 3 

L31 

L32 

U33 


(2.13) 


The algorithm for Doolittle’s decomposition is thus identical to the Gauss elim¬ 
ination procedure in gauss, except that each multiplier X is now stored in the lower 
triangular portion of A. 


■ LUdec 

In this version of LU decomposition the original A is destroyed and replaced by its 
decomposed form [L\U]. 

function A = LUdec(A) 

% LU decomposition of matrix A; returns A = [L\U]. 

% USAGE: A = LUdec(A) 

n = size(A,1); 
for k = l:n-l 

for i = k+l:n 

if A(i,k) ’= 0.0 

lambda = A(i,k)/A(k,k); 

A(i,k+l:n) = A(i,k+l:n) - lambda*A(k,k+1:n); 
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A(i,k) = lambda; 

end 

end 

end 


Solution phase Consider now the procedure for solving Ly = b by forward substi¬ 
tution. The scalar form of the equations is (recall that Lu = 1) 

yi = b i 

Uiy\ + yz = b 2 


Lkiyi + Lk2y2 + • • • + Lfcjt-iTfc-i + yk — bk 


Solving the A;th equation for y^ yields 

A:— 1 

yk = b k -J2 L kjyj . k=2,3,...,n (2.14) 

j =i 

Letting y overwrite b, we obtain the forward substitution algorithm: 
for k = 2:n 

y(k)= b(k) - A(k,l:k-l)*y(l:k-l); 

end 


The back substitution phase for solving Ux = y is identical to that used in the 
Gauss elimination method. 


■ LUsol 

This function carries out the solution phase (forward and back substitutions). It is 
assumed that the original coefficient matrix has been decomposed, so that the input 
is A = [L\U]. The contents of b are replaced by y during forward substitution. Similarly, 
back substitution overwrites y with the solution x. 

function x = LUsol(A,b) 

% Solves L*U*b = x, where A contains both L and U; 

% that is, A has the form [L\U]. 

% USAGE: x = LUsol(A,b) 
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if size(b,2) >1; b = b’; end 
n = length(b); 
for k = 2:n 

b(k) = b(k) - A(k,l:k-l)*b(l:k-l); 

end 

for k = n:-1:1 

b(k) = (b(k) - A(k,k+l:n)*b(k+l:n))/A(k,k); 

end 
x = b; 

Choleski's Decomposition 

Choleski’s decomposition A = LL r has two limitations: 

• Since the matrix product LL r is symmetric, Choleski’s decomposition requires A 
to be symmetric. 

• The decomposition process involves taking square roots of certain combinations 
of the elements of A. It can be shown that square roots of negative numbers can 
be avoided only if A is positive definite. 

Although the number of long operations in all the decomposition methods is 
about the same, Choleski’s decomposition is not a particularly popular means of 
solving simultaneous equations, mainly due to the restrictions listed above. We study 
it here because it is invaluable in certain other applications (e.g., in the transformation 
of eigenvalue problems). 

Let us start by looking at Choleski’s decomposition 

A = LL r (2.15) 

of a 3 x 3 matrix: 


^11 

A\2 

^13 


^11 

0 

0 " 


ill 

L21 

L31 

A21 

^22 

^23 

= 

LZI 

L22 

0 


0 

L22 

L32 

_ ^31 

^32 

^33 _ 


_ £31 

L32 

£33 _ 


0 

0 

L33 _ 


After completing the matrix multiplication on the right hand side, we get 


^ 4 n 

Al2 

^13 


r l 2 

bll -1*21 

L11L3I 

^21 

A-22 

^23 

= 

^11^21 

L 2 + Z 2 
^21 ^ u 22 

L21L3I + L22L32 

_ A31 

A32 

^33 _ 


1 

tr ^ 

CO 

L21L31 + L22L32 

^31 ^ -^32 ^ ^33 J 


Note that the right-hand-side matrix is symmetric, as pointed out before. Equating the 
matrices A and LL r element-by-element, we obtain six equations (due to symmetry 
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only lower or upper triangular elements have to be considered) in the six unknown 
components of L. By solving these equations in a certain order, it is possible to have 
only one unknown in each equation. 

Consider the lower triangular portion of each matrix in Eq. (2.16) (the upper 
triangular portion would do as well). By equating the elements in the first column, 
starting with the first row and proceeding downward, we can compute In, L 2 1 , and 
L 3 1 in that order: 

An = f-n in = y/ An 

A 2 i = -ill-1'21 L 2 i = A 2 i/Ln 
As i = L11L31 Ls\ = Asi/Lu 

The second column, starting with second row, yields L 22 and i 32 : 

A 2 2 = i 2 1 + i 22 L 22 = yj A 22 — 

^32 = L 2 iL 3 i + L 22 L 22 i 3 2 = {A 22 — L 2 iLs\)/L 22 

Finally the third column, third row gives us L 33 : 

^33 = i 3 l + i 3 2 + i 3 3 i 3 3 = yjAss - i 3 i - i 32 

We can now extrapolate the results for an n x n matrix. We observe that a typical 
element in the lower triangular portion of LL r is of the form 

i 

(LL ) ij = Li \ Lj 1 + Li 2 Lj 2 + • • • + LijLjj — ^ ^ iifcijfc; ( ^ j 

k= 1 

Equating this term to the corresponding element of A yields 

j 

Aij = L ik L Jk , i = j, j + 1,..., n, j = 1, 2,..., n (2.17) 

k= 1 

The range of indices shown limits the elements to the lower triangular part. For the 
first column (j = 1), we obtain from Eq. (2.17) 

in = VAn in = An/Lii, i = 2, 3,..., n (2.18) 

Proceeding to other columns, we observe that the unknown in Eq. (2.17) is iy (the 
other elements of L appearing in the equation have already been computed). Taking 
the term containing iy outside the summation in Eq. (2.17), we obtain 

i-i 

Aij = y ' ijfcijfc + LijLjj 

k= 1 
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If i = j (a diagonal term), the solution is 


Ljj — 


\ 


7-1 

A Jj-J2 L %’ j — 2,3,n 

k= 1 


(2.19) 


For a nondiagonal term we get 


7-1 


fc=l 


Ljj = I Ajj - L ik Lj k I /Ljj, j = 2, 3,..., n- 1, i = j + 1, j + 2,..., n (2.20) 


■ choleski 

Note that in Eqs. (2.19) and (2.20) Ay appears only in the formula for Ly. Therefore, 
once Ly has been computed, Ay is no longer needed. This makes it possible to write 
the elements of L over the lower triangular portion of A as they are computed. The 
elements above the principal diagonal of A will remain untouched. At the conclusion 
of decomposition L is extracted with the MATLAB command t r il (A). If a negative Ljj 
is encountered during decomposition, an error message is printed and the program 
is terminated. 

function L = choleski(A) 

% Computes L in Choleski's decomposition A = LL’. 

% USAGE: L = choleski(A) 

n = size(A,1); 
for j = l:n 

temp = A(j,j) - dot(A(j,1:j-1),A(j,1:j-1)) ; 
if temp < 0.0 

error(’Matrix is not positive definite’) 

end 

A( j , j) = sqrt(temp); 
for i = j+l:n 

A(i,j) = (A(i,j) - dot(A(i,1:j-1),A(j,1:j-1)))/A(j , j) ; 

end 

end 

L = tril(A) 

We could also write the algorithm for forward and back substitutions that are 
necessary in the solution of Ax = b. But since Choleski’s decomposition has no ad¬ 
vantages over Doolittle’s decomposition in the solution of simultaneous equations, 
we will skip that part. 
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EXAMPLE 2.5 

Use Doolittle’s decomposition method to solve the equations Ax = b, where 



"l 

4 

r 


7~ 

A = 

1 

6 

-i 

b = 

13 


2 

-1 

2 


5 


Solution We first decompose A by Gauss elimination. The first pass consists of the 
elementary operations 

row 2 <- row 2 —lx row 1 (eliminates A 2 \ ) 

row 3 <- row 3 — 2 x row 1 (eliminates A 3 i) 

Storing the multipliers i 2 i = landL 3 i = 2 in place of the eliminated terms, we obtain 


A' = 


1 

1 

2 


4 1 

2 -2 
-9 0 


The second pass of Gauss elimination uses the operation 

row 3 <- row 3 — (—4.5) x row 2 (eliminates A 32 ) 


Storing the multiplier L 32 = —4.5 in place of A 32 , we get 


A" = [L\U] = 


1 

1 

2 


4 

2 

-4.5 


1 

-2 

-9 


The decomposition is now complete, with 


~1 

0 

O' 


"l 

4 

r 

1 

1 

0 

u = 

0 

2 

-2 

2 

-4.5 

1 


0 

0 

-9 


Solution of Ly = b by forward substitution comes next. The augmented coeffi¬ 
cient form of the equations is 


"l 

0 

0 

7~ 

1 

1 

0 

13 

2 

-4.5 

1 

5 


The solution is 


y i = 7 

y 2 = 13 - y\ = 13 - 7 = 6 

y 3 = 5-2y 1 + 4.5y 2 = 5 - 2(7) + 4.5(6) = 18 
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"1 

4 

1 

7~ 

0 

2 

-2 

6 

0 

0 

-9 

18 


Finally, the equations Ux = y, or 

[u|y] 

are solved by back substitution. This yields 

18 

x 3 = — = —2 
-9 

6 + 2 x 3 6 + 2 (— 2 ) 

X 2 = -=-= 1 

2 2 

X! = 7 - 4x 2 - x 3 = 7 - 4(1) - (-2) = 5 


EXAMPLE 2.6 

Compute Choleski’s decomposition of the matrix 


A = 


4 

-2 

2 


-2 

2 

-4 


2 

-4 

11 


Solution First we note that A is symmetric. Therefore, Choleski’s decomposition is 
applicable, provided that the matrix is also positive definite. An a priori test for posi¬ 
tive definiteness is not needed, since the decomposition algorithm contains its own 
test: if the square root of a negative number is encountered, the matrix is not positive 
definite and the decomposition fails. 

Substituting the given matrix for A in Eq. (2.16), we obtain 


4 

-2 

2" 


r l 2 

mi 

^11^21 

L11L31 

-2 

2 

-4 

= 

L11L21 

L 2 + L 2 
l 21 ^ ^22 

L21L3I + L 22 L3 2 

2 

-4 

11 


r 0 

>-4 

_ 1 

L21L3I + L 22 L3 2 

t 2 r 2 r 2 

^31 ^ ^32 ^ l 33 J 


Equating the elements in the lower (or upper) triangular portions yields 
Lll = yfi = 2 

L 2 1 = - 2 /In = - 2/2 = -1 

L 31 = 2 /L u = 2 / 2=1 

L 22 = 1J2 - L 2 21 = \/2 — l 2 = 1 

, —4 — L 21 L 31 —4 — (— 1 ) ( 1 ) 

= - = - = —O 

l 22 1 

L33 = ,Jn-L 2 31 -L 2 32 = v"ll - (l) 2 - (- 3 ) 2 = 1 
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Therefore, 



0 0 
1 0 

-3 1 


The result can easily be verified by performing the multiplication LL r . 


EXAMPLE 2.7 

Solve AX = B with Doolittle’s decomposition and compute |A|, where 



1 

u> 

1 

I— 1 

4*. 

_1 


"6 -4 

A = 

-2 0 5 

B = 

3 2 


7 2-2 


- 1 

LO 

1 

_i 


Solution In the program below the coefficient matrix A is first decomposed by calling 
LUdec. Then LUsol is used to compute the solution one vector at a time. 

% Example 2.7 (Doolittle’s decomposition) 

A = [3 -1 4; -205; 72-2]; 

B = [6 -4; 3 2; 7 -5] ; 

A = LUdec(A); 

det = prod(diag(A)) 

for i = l:size(B,2) 

X(:,i) = LUsol(A,B(:,i)); 

end 

X 


Here are the results: 


» det = 

-77 

X = 

1.0000 - 1.0000 
1.0000 1.0000 

1.0000 0.0000 


EXAMPLE 2.8 

Test the function choleski by decomposing 


1.44 

-0.36 

5.52 

0.00 

-0.36 

10.33 

-7.78 

0.00 

5.52 

-7.78 

28.40 

9.00 

0.00 

0.00 

9.00 

61.00 
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Solution 

% Example 2.8 (Choleski decomposition) 


A = [1.44 

-0.36 5.52 

0.00; 


-0.36 

10.33 -7.78 

0.00; 


5 . 52 

-7.78 28.40 

9.00; 


0.00 

0.00 9.00 

61.00]; 


L = choleski(A) 



Check = L 

*L’ % Verify the result 

» L = 




1.2000 

0 

0 

0 

-0.3000 

3.2000 

0 

0 

4.6000 

-2.0000 

1.8000 

0 

0 

0 

5.0000 

6.0000 

Check = 




1.4400 

-0.3600 

5.5200 

0 

-0.3600 

10.3300 

-7.7800 

0 

5.5200 

-7.7800 

28.4000 

9.0000 

0 

0 

9.0000 

61.0000 


PROBLEM SET 2.1 

1. By evaluating the determinant, classify the following matrices as singular, ill- 
conditioned or well-conditioned. 


(a) 



2 3 

3 4 

4 5 


(c) 



-1 

2 

-1 


0 

-1 

2 


(b) 


(d) 


2.11 

-0.80 

1.72 

-1.84 

3.03 

1.29 

-1.57 

5.25 

4.30 

4 3 

-l" 


CN 

1 

3 


5 -18 

13 



2. Given the LU decomposition A = LU. determine A and |A|. 



"l 

0 

o’ 


"l 

2 

4" 

(a) L = 

1 

1 

0 

u = 

0 

3 

21 


1 

5/3 

1 


0 

0 

0 



2 

0 

o’ 


"2 

-1 

r 

(b) L = 

-1 

1 

0 

U = 

0 

1 

-3 


1 

-3 

1 


0 

0 

i 
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2.3 LU Decomposition Methods 


3. Utilize the results of LU decomposition 



1 

0 

o' 


'2 

-3 

-1 

A = LU = 

3/2 

1 

0 


0 

13/2 

-7/2 


1/2 

11/13 

1 


0 

0 

32/13 


to solve Ax = b, where b r = I 1 —1 2j. 

4. Use Gauss elimination to solve the equations Ax = b, where 



'2 

-3 

-l' 


3~ 

A = 

3 

2 

-5 

b = 

—9 


2 

4 

-1 


-5 


5. Solve the equations AX = B by Gauss elimination, where 



2 

0 

-1 

O' 


'l 

o' 


0 

1 

2 

0 

B = 

0 

0 

A = 

-1 

2 

0 

1 

0 

1 



0 

0 

1 

—2_ 


_0 

0_ 


6. Solve the equations Ax = b by Gauss elimination, where 


'0 

0 

2 

1 

2" 


' 1" 

0 

1 

0 

2 

-1 


1 

1 

2 

0 

-2 

0 

b = 

-4 

0 

0 

0 

-1 

1 


-2 

_0 

1 

-1 

1 

-1. 


_-l_ 


Hint: reorder the equations before solving. 

7. Find L and U so that 


A = LU = 


4 

-1 

0 


-1 

4 

-1 


0 

-1 

4 


using (a) Doolittle’s decomposition; (b) Choleski’s decomposition. 

8. Use Doolittle’s decomposition method to solve Ax = b, where 


-3 

6 

-4’ 


-3~ 

9 

-8 

24 

b = 

65 

-12 

24 

-26 


-42 


9. Solve the equations Ax = b by Doolittle’s decomposition method, where 


2.34 

-4.10 

1.78' 


0.02 ' 

-1.98 

3.47 

-2.22 

b = 

-0.73 

2.36 

-15.17 

6.18 


-6.63 
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10. Solve the equations AX = B by Doolittle’s decomposition method, where 


4 

-3 

6" 


"l 

o' 

8 

-3 

10 

B = 

0 

1 

-4 

12 

-10 


0 

0 


11. Solve the equations Ax = b by Choleski's decomposition method, where 


"l 

1 

l" 


1 

1 

2 

2 

b = 

3/2 

1 

2 

3 


3 


12. Solve the equations 


4 -2 -3~ 


Xi 


i.r 

12 4 -10 


x 2 

= 

0 

1 

1 

H - * 

05 

ro 

CO 

I— 1 

CO 

1_ 


_x 3 _ 


-2.3 


by Doolittle’s decomposition method. 

13. Determine L that results from Choleski’s decomposition of the diagonal matrix 

0 0 
0 (*2 0 

A= 0 0 a 3 


14. ■ Modify the function gauss so that it will work with m constant vectors. Test the 
program by solving AX = B, where 


2 

-1 

o' 


'l 

0 

o' 

-1 

2 

-1 

B = 

0 

1 

0 

0 

-1 

1 


0 

0 

1 


15. BA well-known example of an ill-conditioned matrix is the Hilbert matrix 

'1 1/2 1/3 

1/2 1/3 1/4 

A= 1/3 1/4 1/5 ••• 


Write a program that specializes in solving the equations Ax = b by Doolittle’s 
decomposition method, where A is the Hilbert matrix of arbitrary size n x n, and 

b i = ±Aj 

j= i 
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The program should have no input apart from n. By running the program, de¬ 
termine the largest n for which the solution is within 6 significant figures of the 
exact solution 


x=[l . . ..f 

(the results depend on the software and the hardware used). 

16. ■ Write a function for the solution phase of Choleski’s decomposition method. 
Test the function by solving the equations Ax = b, where 


4 

-2 

2" 


6~ 

-2 

2 

-4 

b = 

— 10 

2 

-4 

11 


27 


Use the function choleski for the decomposition phase. 

17. ■ Determine the coefficients of the polynomial y = aa + a\x + CI 2 X 2 + a- : \X°' that 
passes through the points (0,10), (1, 35), (3, 31) and (4,2). 

18. ■ Determine the 4th degree polynomial y(x) that passes through the points 
(0, -1), (1, 1), (3, 3), (5, 2) and (6, -2). 

19. ■ Find the 4th degree polynomial y(x) that passes through the points (0,1), 
(0.75, —0.25) and (1,1), and has zero curvature at (0,1) and (1,1). 

20. ■ Solve the equations Ax = b, where 



3.50 

2.77 

-0.76 

1.80' 


7.31" 


-1.80 

2.68 

3.44 

-0.09 


4.23 

A = 

0.27 

5.07 

6.90 

1.61 

b = 

13.85 


L71 

5.45 

2.68 

1.71 _ 


_ 11.55_ 


By computing | A| and Ax comment on the accuracy of the solution. 


2.4 Symmetric and Banded Coefficient Matrices 
Introduction 

Engineering problems often lead to coefficient matrices that are sparsely populated, 
meaning that most elements of the matrix are zero. If all the nonzero terms are clus¬ 
tered about the leading diagonal, then the matrix is said to be banded. An example of 
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a banded matrix is 


X 

X 

0 

0 

0 

X 

X 

X 

0 

0 

0 

X 

X 

X 

0 

0 

0 

X 

X 

X 

0 

0 

0 

X 

X 


where X’s denote the nonzero elements that form the populated band (some of these 
elements may be zero). All the elements lying outside the band are zero. The matrix 
shown above has a bandwidth of three, since there are at most three nonzero elements 
in each row (or column). Such a matrix is called tridiagonal. 

If a banded matrix is decomposed in the form A = LU, both L and U will retain 
the banded structure of A. For example, if we decomposed the matrix shown above, 
we would get 


"X 

0 

0 

0 

O' 


'X 

X 

0 

0 

O' 

X 

X 

0 

0 

0 


0 

X 

X 

0 

0 

0 

X 

X 

0 

0 

u = 

0 

0 

X 

X 

0 

0 

0 

X 

X 

0 


0 

0 

0 

X 

X 

_0 

0 

0 

X 

X. 


.0 

0 

0 

0 

x_ 


The banded structure of a coefficient matrix can be exploited to save storage and 
computation time. If the coefficient matrix is also symmetric, further economies are 
possible. In this article we show how the methods of solution discussed previously 
can be adapted for banded and symmetric coefficient matrices. 


Tridiagonal Coefficient Matrix 

Consider the solution of Ax = b by Doolittle’s decomposition, where A is the nxn 
tridiagonal matrix 


di e\ 0 0 

C\ (l\> 62 0 

0 C2 d3 t>3 

0 0 C 3 d 4 


0 

0 

0 

0 


0 d n 


0 


0 
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As the notation implies, we are storing the nonzero elements of A in the vectors 


c = 

Cl 

C 2 

d = 

d\ 

dz 

e = 

i 

_i 


_ Cn —1_ 


dn—l 

_ d n _ 


i 

7 

_i 


The resulting saving of storage can be significant. For example, a 100 x 100 tridiag¬ 
onal matrix, containing 10,000 elements, can be stored in only 99 + 100 + 99 = 298 
locations, which represents a compression ratio of about 33:1. 

We now apply LU decomposition to the coefficient matrix. We reduce row k by 
getting rid of tv i with the elementary operation 

row k <- row k — (Cfc-i/rfe-i) x row (fc — 1 ), k = 2 , 3 , ..., n 

The corresponding change in d/,- is 

die dje (Cje— | /dje— \ ) &k— | ( 2 . 21 ) 

whereas ty is not affected. In order to finish up with Doolittle’s decomposition of the 
form [L\U], we store the multiplier X = i /c4_ i in the location previously occupied 
bycfc_i: 


Ck -1 <- Ck~\/dk~\ ( 2 . 22 ) 

Thus the decomposition algorithm is 

for k = 2:n 

lambda = c(k-l)/d(k-l); 
d(k) = d(k) - lambda*e(k-1); 
c(k-l) = lambda; 

end 


Next we look at the solution phase, i.e., the solution of the Ly = b, followed by 
Ux = y. The equations Ly = b can be portrayed by the augmented coefficient matrix 


i 

0 

0 

0 

0 

bi~ 

Cl 

1 

0 

0 ••• 

0 

b 2 

0 

C2 

1 

0 ••• 

0 

b 3 

0 

0 

C 3 

1 ... 

0 

h 

0 

0 


1 

c 

o 

• • o 

1 

bn 
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Note that the original contents of c were destroyed and replaced by the multipliers 
during the decomposition. The solution algorithm for y by forward substitution is 

y(i) = b(i) 
for k = 2:n 

y(k) = b(k) - c(k-1)*y(k-1); 

end 


The augmented coefficient matrix representing Ux = y is 


d\ 

ei 

0 

0 

0 

yi " 

0 

d2 

f?2 ' 

0 

0 

T2 

0 

0 

d% 

0 

0 

T3 

0 

0 

0 

dn—l 

Bn-1 

yn -1 

0 

0 

0 ■ 

0 

dn 

yn _ 



Note again that the contents of d were altered from the original values during the 
decomposition phase (but e was unchanged). The solution for x is obtained by back 
substitution using the algorithm 

x(n) = y(n)/d(n); 
for k = n-1:-1:1 

x(k) = (y(k) - e(k)*x(k+l))/d(k); 

end 


■ LUdec3 

The function LUdec3 contains the code for the decomposition phase. The original 
vectors c and d are destroyed and replaced by the vectors of the decomposed matrix. 

function [c,d,e] = LUdec3(c,d,e) 

% LU decomposition of tridiagonal matrix A = [c\d\e]. 

% USAGE: [c,d,e] = LUdec3(c,d,e) 

n = length(d); 
for k = 2:n 

lambda = c(k-l)/d(k-l); 
d(k) = d(k) - lambda*e(k-1); 
c(k-l) = lambda; 


end 








59 


2.4 Symmetric and Banded Coefficient Matrices 


■ LUsol3 

This is the function for the solution phase. The vector y overwrites the constant vector 
b during the forward substitution. Similarly, the solution vector x replaces y in the 
back substitution process. 

function x = LUsol3(c,d,e,b) 

% Solves A*x = b where A = [c\d\e] is the LU 
% decomposition of the original tridiagonal A. 

% USAGE: x = LUsol3(c,d,e,b) 

n = length(d); 

for k = 2:n % Forward substitution 

b(k) = b(k) - c(k-l)*b(k-l); 

end 

b(n) = b(n)/d(n); % Back substitution 

for k = n-1:-1:1 

b(k) = (b(k) -e(k)*b(k+l))/d(k); 

end 
x = b; 

Symmetric Coefficient Matrices 

More often than not, coefficient matrices that arise in engineering problems are 
symmetric as well as banded. Therefore, it is worthwhile to discover special prop¬ 
erties of such matrices, and learn how to utilize them in the construction of efficient 
algorithms. 

If the matrix A is symmetric, then the LU decomposition can be presented in the 
form 


A = LU = LDL r (2.23) 

where D is a diagonal matrix. An example is Choleski's decomposition A = LL ; that 
was discussed in the previous article (in this case D = I). For Doolittle’s decomposition 
we have 



~Di 

0 

0 • 

• 0 " 


'1 

f-21 

L 31 • 

Lnl 


0 

d 2 

0 • 

• 0 


0 

1 

L 32 ■ 

Ln2 

u = DL r = 

0 

0 

d 3 ■ 

• 0 


0 

0 

1 ■ 

• L n 3 


. 0 

0 

0 • 

D n _ 


.0 

0 

0 

1 
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which gives 



'Or 

D 1 L 21 

D\L 3 i 

• • • D\L n \ 


0 

D 2 

D 2 L 32 

' ■ ■ D2L n 2 

u = 

0 

0 

D 3 

D 3 L 3n 


. 0 

0 

0 

D n _ 


We see that during decomposition of a symmetric matrix only U has to be stored, since 
D and L can be easily recovered from U. Thus Gauss elimination, which results in an 
upper triangular matrix of the form shown in Eq. (2.24), is sufficient to decompose a 
symmetric matrix. 

There is an alternative storage scheme that can be employed during LU decom¬ 
position. The idea is to arrive at the matrix 


D i L 21 L 31 
0 D2 L32 
0 0 D 3 


Ln 1 
L n 2 

L n 3 


0 0 0 ■ D n 


(2.25) 


Here U can be recovered from L/y = DjLji. It turns out that this scheme leads to a 
computationally more efficient solution phase; therefore, we adopt it for symmetric, 
banded matrices. 


Symmetric, Pentadiagonal Coefficient Matrix 

We encounter pentadiagonal (bandwidth = 5) coefficient matrices in the solution of 
fourth-order, ordinary differential equations by finite differences. Often these matrices 
are symmetric, in which case an nx n matrix has the form 


d\ 

S\ 

fi 

0 

0 

0 


0 

S\ 

(I 2 

e 2 

h 

0 

0 


0 

fi 

s 2 

d 3 

S 3 

h 

0 


0 

0 

h 

S 3 

d/± 

Sa 

/4 


0 

0 


0 

fn—4 

Sn- 3 

dn- 2 

e n -2 

fn—2 

0 


0 

0 

fn—3 

Sn- 2 

dn— 1 

@n— 1 

0 


0 

0 

0 

fn—2 

&n— 1 

d n 
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As in the case of tridiagonal matrices, we store the nonzero elements in the three 
vectors 


d = 


d\ 




d-2 


e\ 


' h ' 



e 2 

f = 

h 


e = 



d-n—2 





d-n— 1 


e n -2 


. fit- 2 _ 

d n 


_ @n— 1_ 




Let us now look at the solution of the equations Ax = b by Doolittle’s decomposi¬ 
tion. The first step is to transform A to upper triangular form by Gauss elimination. If 
elimination has progressed to the stage where the kth row has become the pivot row, 
we have the following situation: 





... 0 

d k e k f k 

0 0 0 

... 0 

e k d k+ 1 e k+ i 

fk +1 o 0 ■■■ 

... 0 

f k e k +i d k+ 2 

Sk+2 fk+2 0 

... 0 

0 fk +1 e k+ 2 

d) t+3 e k+ 3 f k+ 3 ■ ■ ■ 





The elements e k and f\- below the pivot row are eliminated by the operations 

row (A: + 1) <- row (/c -1-1) — (e k /d k ) x row k 
row (Ar+ 2) <- row (fc+ 2) — ( fk/dk ) x row k 

The only terms (other than those being eliminated) that are changed by the above 
operations are 


dk+ i dk+i - [e k /d k )e k 

e k + i <- e k+ i - [e k /d k )f k (2.27a) 

d k + 2 4- d k+ 2 - {f k /d k )f k 

Storage of the multipliers in the upper triangular portion of the matrix results in 


e k 4- Ck!d k f k <- f k /d k 


(2.27b) 
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At the conclusion of the elimination phase the matrix has the form (do not confuse d, 
e and f with the original contents of A) 


U* 


d\ e\ f\ 0 

0 d 2 e 2 f 2 

0 0 dz £?3 


0 

0 

0 


0 0 ■ • • 0 i e„_i 

0 0 ■ • • 0 0 d n 


Next comes the solution phase. The equations Ly = b have the augmented coef¬ 
ficient matrix 



■ i 

0 

0 

0 

... 0 

hi 


Cl 

1 

0 

0 

... 0 

b 2 


fl 

62 

1 

0 

... 0 

bz 

M = 

0 

h 

e 3 

1 

... 0 

bi 


0 

0 

0 

fn—2 

e„-i 1 

bn 


Solution by forward substitution yields 
y\ = bi 

y 2 = b 2 - e iyi 


(2.28) 


yk = b k - fk-zyk -2 - e k -iy k -i, k = 3, 4. n 


The equations to be solved by back substitution, namely Ux = y, have the augmented 
coefficient matrix 


d\ 

d\G\ 

d\f\ 

0 

0 

yi " 

0 


d'iS2 

d 2 f 2 

0 

k2 

0 

0 

dz 

dze 3 

0 

T3 

0 

0 


1 

'ts 

• • o 

dn—l@n—l 

y n ~i 

0 

0 


0 0 

d n 

y n _ 



the solution of which is obtained by back substitution: 

Xn — yn/dn 

Xn — 1 — y n— 1 / dfi— 1 — 6 n —\X n 

x k = y k /d k - e k x k+ 1 - fkXk+ 2 , k=n-2, n-3, ...,1 


(2.29) 
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■ LUdec 5 

The function LUdec 3 decomposes a symmetric, pentadiagonal matrix A stored in the 
form A = [f \e\d\e\f]. The original vectors d, e and f are destroyed and replaced by 
the vectors of the decomposed matrix. 

function [d,e,f] = LUdec5(d,e,f) 

% LU decomposition of pentadiagonal matrix A = [f\e\d\e\f] . 

% USAGE: [d,e,f] = LUdec5(d,e,f) 

n = length(d); 
for k = l:n-2 

lambda = e(k)/d(k); 

d(k+l) = d(k+l) - lambda*e(k); 

e(k+l) = e(k+l) - lambda*f(k); 

e(k) = lambda; 

lambda = f(k)/d(k); 

d(k+2) = d(k+2) - lambda*f(k); 

f(k) = lambda; 

end 

lambda = e(n-l)/d(n-l); 
d(n) = d(n) - lambda*e(n-1); 
e(n-l) = lambda; 


■ LUsol5 

LUsol5 is the function for the solution phase. As in LUsol3, the vector y over¬ 
writes the constant vector b during forward substitution and x replaces y during back 
substitution. 

function x = LUsol5(d,e,f,b) 

% Solves A*x = b where A = [f\e\d\e\f] is the LU 
% decomposition of the original pentadiagonal A. 

% USAGE: x = LUsol5(d,e,f,b) 

n = length(d); 

b(2) = b(2) - e(l)*b(l); % Forward substitution 

for k = 3:n 

b(k) = b(k) - e(k-l)*b(k-l) - f(k-2)*b(k-2); 


end 
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b(n) = b(n)/d(n); % Back substitution 

b(n-l) = b(n-l)/d(n-l) - e(n-l)*b(n); 
for k = n-2:-1:1 

b(k) = b(k)/d(k) - e(k)*b(k+l) - f(k)*b(k+2); 

end 
x = b; 


EXAMPLE 2.9 

As a result of Gauss elimination, a symmetric matrix A was transformed to the upper 
triangular form 

'4 -2 1 0" 

y _ 0 3 -3/2 1 

0 0 3 -3/2 

00 0 35/12_ 

Determine the original matrix A. 


Solution First we find L in the decomposition A = LU. Dividing each row of U by its 
diagonal element yields 


L r 


1 -1/2 1/4 0 

0 1 -1/2 1/3 

0 0 1 - 1/2 

0 0 0 1 


Therefore, A = LU becomes 


1 

0 

0 

O' 


'4 

-2 

1 

O' 

-1/2 

1 

0 

0 


0 

3 -3/2 

1 

1/4 

-1/2 

1 

0 


0 

0 

3 

-3/2 

0 

1/3 

-1/2 

1 _ 


_0 

0 

0 

35/12_ 


4-210 
-2 4-2 1 

1-2 4-2 

0 1-24 


EXAMPLE 2.10 

Determine L and D that result from Doolittle’s decomposition A = LDL r of the sym¬ 
metric matrix 


A = 


3 

-3 

3 


-3 

5 

1 


3 

1 

10 
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Solution We use Gauss elimination, storing the multipliers in the upper triangular 
portion of A. At the completion of elimination, the matrix will have the form of U* in 
Eq. (2.25). 

The terms to be eliminated in the first pass are A Z \ and A 3i using the elementary 
operations 

row 2 <- row 2 — (— 1 ) x row 1 
row 3 <- row 3 — (1) x row 1 


Storing the multipliers (— 1 and 1) in the locations occupied by Au and A 13 , we get 


A' = 


3 

0 

0 


-1 1 
2 4 
4 7 


The second pass is the operation 


row 3 row 3 — 2 x row 2 


which yields after overwriting A- a with the multiplier 2 


Hence 


A" = [0\D\L r ] 


3 -1 1 

0 2 2 
0 0-1 


1 

0 

O' 


"3 

0 

o' 

-1 

1 

0 

D = 

0 

2 

0 

1 

2 

1 


0 

0 

-1 


EXAMPLE 2.11 

Solve Ax = b, where 


'6-4 1 0 0 ..." 


X\ 


"3" 

-4 6-4 1 0 


x 2 


0 

1_4 6-4 1 ■ ■ • 


*3 

= 

0 

0 1-4 6-4 


*9 


0 

0 0 1-4 7 


_ Xio _ 


4 


Solution As the coefficient matrix is symmetric and pentadiagonal, we utilize the 
functionsLUdec5 andLUsol5: 


% Example 2.11 (Solution of pentadiagonal eqs.) 
n = 10; 
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d = 6*ones(n,l); d(n) = 7; 
e = -4*ones(n-1,1); 
f = ones(n-2,1); 

b = zeros(n,l); b(l) = 3; b(n) = 4; 

[d,e,f] = LUdec5(d,e,f); 
x = LUsol5(d,e,f,b) 

The output from the program is 

» X = 

2.3872 

4.1955 

5.4586 

6.2105 

6.4850 
6.3158 
5.7368 
4.7820 

3.4850 
1.8797 

2.5 Pivoting 

Introduction 

Sometimes the order in which the equations are presented to the solution algorithm 
has a significant effect on the results. For example, consider the equations 


2Xi — X2 = 1 


X] + 2x 2 — x 3 = 0 


-x 2 + x 3 = 0 


The corresponding augmented coefficient matrix is 



2-1 0 1 
1 2-10 
0-1 10 


(a) 


Equations (a) are in the “right order” in the sense that we would have no trouble 
obtaining the correct solution jq = x 2 = X 3 = 1 by Gauss elimination or LU decom¬ 
position. Now suppose that we exchange the first and third equations, so that the 







67 


2.5 Pivoting 


augmented coefficient matrix becomes 


1 

0 

-1 

1 

1 

O 

-1 

2 

-1 

0 

2 

-1 

0 

1 

1—1 


(b) 


Since we did not change the equations (only their order was altered), the solution is still 
Xi = X 2 = X 3 = 1. However, Gauss elimination fails immediately due to the presence 
of the zero pivot element (the element dn). 

The above example demonstrates that it is sometimes essential to reorder the 
equations during the elimination phase. The reordering, or row pivoting, is also re¬ 
quired if the pivot element is not zero, but very small in comparison to other elements 
in the pivot row, as demonstrated by the following set of equations: 


E 

-1 

1 

0 “ 

-1 

2 

-1 

0 

2 

-1 

0 

1 


These equations are the same as Eqs. (b), except that the small number e replaces the 
zero element An in Eq. (b). Therefore, if we let e 0, the solutions of Eqs. (b) and (c) 
should become identical. After the first phase of Gauss elimination, the augmented 
coefficient matrix becomes 


£ 

-1 

1 

(f 

0 

2- 1/e 

-1 + 1/e 

0 

0 

-1+2/e 

-2/e 

1 


Because the computer works with a fixed word length, all numbers are rounded off 
to a finite number of significant figures, ff e is very small, then 1 /e is huge, and an 
element such as 2 — 1/e is rounded to -1/e. Therefore, for sufficiently small e, the 
Eqs. (d) are actually stored as 


e 

-1 

1 

0“ 

0 

-1/e 

1/e 

0 

0 

2/e 

-2/e 

1 


Because the second and third equations obviously contradict each other, the solution 
process fails again. This problem would not arise if the first and second, or the first 
and the third, equations were interchanged in Eqs. (c) before the elimination. 

The last example illustrates the extreme case where e was so small that roundoff 
errors resulted in total failure of the solution. If we were to make e somewhat bigger 
so that the solution would not “bomb” any more, the roundoff errors might still be 
large enough to render the solution unreliable. Again, this difficulty could be avoided 
by pivoting. 
















68 


Systems of Linear Algebraic Equations 


Diagonal Dominance 

An nx n matrix A is said to be diagonally dominant if each diagonal element is larger 
than the sum of the other elements in the same row (we are talking here about absolute 
values). Thus diagonal dominance requires that 

n 

I An | > \M (*' = 1,2,..., n) (2.30) 

]=i 

For example, the matrix 

‘ —2 4 -l" 

1 -1 3 

4-2 1_ 

is not diagonally dominant, but if we rearrange the rows in the following manner 

‘4-2 l‘ 

-2 4 -1 

1 _1 3 _ 

then we have diagonal dominance. 

It can be shown that if the coefficient matrix A of the equations Ax = b is diagonally 
dominant, then the solution does not benefit from pivoting; that is, the equations are 
already arranged in the optimal order. It follows that the strategy of pivoting should be 
to reorder the equations so that the coefficient matrixis as close to diagonal dominance 
as possible. This is the principle behind scaled row pivoting, discussed next. 


Gauss Elimination with Scaled Row Pivoting 

Consider the solution of Ax = b by Gauss elimination with row pivoting. Recall that 
pivoting aims at improving diagonal dominance of the coefficient matrix, i.e., making 
the pivot element as large as possible in comparison to other elements in the pivot 
row. The comparison is made easier if we establish an array s, with the elements 


Si = max j Aij \, i = 1,2,..., n 


(2.31) 


Thus St, called the scale factor of row i, contains the absolute value of the largest 
element in the 2 th row of A. The vector s can be obtained with the following algorithm: 


for i = l:n 

s(i) = max (abs(A(i,1:n))) 


end 
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The relative size of any element A,j (i.e., relative to the largest element in the ith 
row) is defined as the ratio 



Suppose that the elimination phase has reached the stage where the kih row has 
become the pivot row. The augmented coefficient matrix at this point is shown below. 


^11 

^12 

^13 

Au ■ ■ ■ 

A\n 

bi 

0 

^22 

^23 

A 2 4 ■ ■ ■ 

A-2 n 

b 2 

0 

0 

^33 

A34 ■ ■ ■ 

Asn 

b 3 

0 


0 

Atfc • • • 

Akn 

b k 

0 


0 

A-nk 

Ann 

bn_ 


We don’t automatically accept A kk as the pivot element, but look in the /cth column 
below Akic for a “better” pivot. The best choice is the element A pk that has the largest 
relative size; that is, we choose p such that 


r pk = max r jk 
]>k 

If we find such an element, then we interchange the rows k and p, and proceed with 
the elimination pass as usual. Note that the corresponding row interchange must also 
be carried out in the scale factor array s. The algorithm that does all this is 


for k = l:n-l 

% Find element with largest relative size 
% and the corresponding row number p 
[Amax,p] = max(abs(A(k:n,k))./s(k:n)); 
p = p + k - 1; 

% If this element is very small, matrix is singular 
if Amax < eps 

error(’Matrix is singular’) 

end 

% Interchange rows k and p if needed 
if p "= k 

b = swapRows(b,k,p); 
s = swapRows(s,k,p); 

A = swapRows(A,k,p); 


end 
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% Elimination pass 


end 

■ swapRows 

The function swapRows interchanges rows i and j of a matrix or vector v: 

function v = swapRows(v,i,j) 

% Swap rows i and j of vector or matrix v. 

% USAGE: v = swapRows(v,i,j) 

temp = v(i,:); 
v(i,:) = v(j,:); 
v(j,:) = temp; 

■ gaussPiv 

The function gaussPiv performs Gauss elimination with row pivoting. Apart from 
row swapping, the elimination and solution phases are identical to those of function 
gauss in Art. 2.2. 

function x = gaussPiv(A,b) 

% Solves A*x = b by Gauss elimination with row pivoting. 

% USAGE: x = gaussPiv(A,b) 

if size(b,2) >1; b = b’; end 
n = length(b); s = zeros(n,l); 

% -Set up scale factor array- 

for i = l:n; s(i) = max(abs(A(i,1:n))); end 

% -Exchange rows if necessary- 

for k = l:n-l 

[Amax,p] = max(abs(A(k:n,k))./s(k:n)); 
p = p + k - 1; 

if Amax < eps; error('Matrix is singular’); end 
if p "= k 

b = swapRows(b,k,p); 
s = swapRows(s,k,p); 

A = swapRows(A,k,p); 


end 
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% -Elimination pass 

for i = k+l:n 


if A(i,k) "= 0 

lambda = A(i,k)/A(k,k); 

A(i,k+l:n) = A(i,k+l:n) - lambda*A(k,k+1:n); 
b(i) = b(i) - lambda*b(k); 

end 

end 

end 

% -Back substitution phase- 

for k = n:-1:1 

b(k) = (b(k) - A(k,k+l:n)*b(k+l:n))/A(k,k); 

end 
x = b; 

■ LUdecPiv 

The Gauss elimination algorithm can be changed to Doolittle’s decomposition with 
minor changes. The most important of these is keeping a record of the row inter¬ 
changes during the decomposition phase. In LUdecPiv this record is kept in the 
permutation array perm, initially set to [1,2,..., n\ T . Whenever two rows are inter¬ 
changed, the corresponding interchange is also carried out in perm. Thus perm shows 
how the original rows were permuted. This information is then passed to the function 
LUsolPiv, which rearranges the elements of the constant vector in the same order 
before carrying out forward and back substitutions. 

function [A,perm] = LUdecPiv(A) 

% LU decomposition of matrix A; returns A = [L\U] 

% and the row permutation vector ’perm’. 

% USAGE: [A,perm] = LUdecPiv(A) 


n = size(A,l); s = zeros(n,l); 
perm = (1:n)’; 

% -Set up scale factor array- 

for i = l:n; s(i) = maxfabs(A(i,1:n))); end 

% -Exchange rows if necessary- 

for k = l:n-l 


[Amax,p] = maxfabs(A(k:n,k))./s(k:n)) ; 
p = p + k - 1; 
if Amax < eps 

error(’Matrix is singular’) 


end 













72 


Systems of Linear Algebraic Equations 


if p '= k 

s = swapRows(s,k,p) ; 

A = swapRows(A,k,p); 
perm = swapRows(perm,k,p); 

end 

% -Elimination pass- 

for i = k+l:n 

if A(i,k) "= 0 

lambda = A(i,k)/A(k,k); 

A(i,k+l:n) = A(i,k+l:n) - lambda*A(k,k+1:n); 
A(i,k) = lambda; 

end 

end 

end 

■ LUsolPiv 

function x = LUsolPiv(A,b,perm) 

% Solves L*U*b = x, where A contains row-wise 
% permutation of L and U in the form A = [L\U]. 

% Vector ’perm’ holds the row permutation data. 


% USAGE: x = LUsolPiv(A,b,perm) 

% -Rearrange b , store it in x- 

if size(b) >1; b = b’; end 
n = size(A,1); 
x = b; 

for i = l:n; x(i) = b(perm(i)); end 

% -Forward and back substitution- 

for k = 2:n 

x(k) = x(k) - A(k,l:k-l)*x(l:k-l); 

end 

for k = n:-1:1 

x(k) = (x(k) - A(k,k+1:n)*x(k+l:n))/A(k,k); 

end 


When to Pivot 

Pivoting has a couple of drawbacks. One of these is the increased cost of computation; 
the other is the destruction of the symmetry and banded structure of the coefficient 
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matrix. The latter is of particular concern in engineering computing, where the co¬ 
efficient matrices are frequently banded and symmetric, a property that is utilized 
in the solution, as seen in the previous article. Fortunately, these matrices are often 
diagonally dominant as well, so that they would not benefit from pivoting anyway. 

There are no infallible rules for determining when pivoting should be used. Expe¬ 
rience indicates that pivoting is likely to be counterproductive if the coefficient matrix 
is banded. Positive definite and, to a lesser degree, symmetric matrices also seldom 
gain from pivoting. And we should not forget that pivoting is not the only means of 
controlling roundoff errors—there is also double precision arithmetic. 

It should be strongly emphasized that the above rules of thumb are only meant 
for equations that stem from real engineering problems. It is not difficult to concoct 
“textbook” examples that do not conform to these rules. 


EXAMPLE 2.12 

Employ Gauss elimination with scaled row pivoting to solve the equations Ax = b, 
where 


2 

-2 

6" 


~ 16" 

-2 

4 

3 

b = 

0 

-1 

8 

4 


-1 


Solution The augmented coefficient matrix and the scale factor array are 


2 

-2 

6 

16" 


~6~ 

-2 

4 

3 

0 

s = 

4 

-1 

8 

4 

-1 


8 


Note that s contains the absolute value of the largest element in each row of A. At this 
stage, all the elements in the first column of A are potential pivots. To determine the 
best pivot element, we calculate the relative sizes of the elements in the first column: 


1 

_ 1 


1 

> 

_ i 


"1/3" 

r 2 \ 

= 

1^21 

/S2 

= 

1/2 

1 

£ 

1 _ 


_ i 

A 3 _ 


1/8 


Since r 2 1 is the biggest element, we conclude that A 2 \ makes the best pivot element. 
Therefore, we exchange rows 1 and 2 of the augmented coefficient matrix and the 
scale factor array, obtaining 


CO 

CNJ 

1 

1_ 

o" 

<- 

~4~ 

2-2 6 

16 

s = 

6 

-1 8 4 

-1 


8 
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Now the first pass of Gauss elimination is carried out (the arrow points to the pivot 
row), yielding 


co 

1 

1_ 

O' 


" 4 " 

0 2 9 

16 

s = 

6 

0 6 5/2 

-1 


8 


The potential pivot elements for the next elimination pass are A 22 and A 32 . We 
determine the “winner” from 


* 


* 


* 

r 2 2 

= 

1 A 221 / S 2 

= 

1/3 

_ r 32_ 


1 A 321 / S3 


_3/4_ 


Note that r u is irrelevant, since row 1 already acted as the pivot row. Therefore, it is 
excluded from further consideration. As r 32 is larger than r 2 2 , the third row is the better 
pivot row. After interchanging rows 2 and 3, we have 


co 

CNJ 

1 

1_ 

O' 


" 4 " 

0 6 5/2 

-1 

s = 

8 

0 2 9 

16 


6 


The second elimination pass now yields 


[A" I b"] = [u | c] 


'-2 

4 

3 

0 

0 

6 

5/2 

-1 

0 

0 

49/6 

49/3 


This completes the elimination phase. It should be noted that U is the matrix that 
would result in the LU decomposition of the following row-wise permutation of A (the 
ordering of rows is the same as achieved by pivoting): 

‘-2 4 3~ 

-1 8 4 

2 -2 6 _ 

Since the solution of Ux = c by back substitution is not affected by pivoting, we skip 
the detailed computation. The result is x r = |^ 1 —1 2 J. 

Alternate Solution It it not necessary to physically exchange equations during piv¬ 
oting. We could accomplish Gauss elimination just as well by keeping the equations 
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in place. The elimination would then proceed as follows (for the sake of brevity, we 
skip repeating the details of choosing the pivot equation): 




2 

-2 

6 

16" 

-2 

4 

3 

0 

-1 

8 

4 

-1 


0 

2 

9 

16” 

-2 

4 

3 

0 

0 

6 

5/2 

-1 



0 

0 

49/6 

49/3" 

-2 

4 

3 

0 

0 

6 

5/2 

-1 


But now the back substitution phase is a little more involved, since the order in which 
the equations must be solved has become scrambled. In hand computations this is 
not a problem, because we can determine the order by inspection. Unfortunately, 
“by inspection” does not work on a computer. To overcome this difficulty, we have 
to maintain an integer array p that keeps track of the row permutations during the 
elimination phase. The contents of p indicate the order in which the pivot rows were 
chosen. In this example, we would have at the end of Gauss elimination 


P = 


2 

3 

1 


showing that row 2 was the pivot row in the first elimination pass, followed by row 3 in 
the second pass. The equations are solved by back substitution in the reverse order: 
equation 1 is solved first for x 3 , then equation 3 is solved for x 2 , and finally equation 
2 yields X\. 

By dispensing with swapping of equations, the scheme outlined above would 
probably result in a faster (and more complex) algorithm than gaussPiv, but the 
number of equations would have to be quite large before the difference becomes 
noticeable. 


PROBLEM SET 2.2 

1. Solve the equations Ax = b by utilizing Doolittle’s decomposition, where 


3 

-3 

3" 


9~ 

-3 

5 

1 

b = 

-7 

3 

1 

5 


12 
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2. Use Doolittle’s decomposition to solve Ax = b, where 



4 

8 

20 ' 


24" 

A = 

8 

13 

16 

b = 

18 


20 

16 

-91 


-119 


3. Determine L and D that result from Doolittle’s decomposition of the matrix 



‘ 2 

-2 

0 

0 

0 


-2 

5 

-6 

0 

0 

A = 

0 

-6 

16 

12 

0 


0 

0 

12 

39 

-6 


0 

0 

0 

-6 

14 


4. Solve the tridiagonal equations Ax = b by Doolittle’s decomposition method, 
where 


' 6 

2 

0 

0 

O' 


' 2" 

-1 

7 

2 

0 

0 


-3 

0 

-2 

8 

2 

0 

b = 

4 

0 

0 

3 

7 

-2 


-3 

0 

0 

0 

3 

5. 


1 _ 


5. Use Gauss elimination with scaled row pivoting to solve 


4 -2 1 

-2 1 -1 
-2 3 6 


Xl 


2" 

X 2 

= 

-1 

_X 3 _ 


0 


6 . Solve Ax = b by Gauss elimination with scaled row pivoting, where 


2.34 

-4.10 

1.78" 


0 .02" 

-1.98 

3.47 

-2.22 

b = 

-0.73 

2.36 

-15.17 

6.81 


-6.63 


7. Solve the equations 


1 

N5 

1 

h- 1 

O 

O 
_1 


"Xi ' 


"1“ 

0 0-11 


X 2 


0 

0-1 2-1 


X 3 


0 

1 

O 

»—i 

1 

CNJ 

r—H 

1 

_1 


_x 4 _ 


_0_ 


by Gauss elimination with scaled row pivoting. 
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8 . ■ Solve the equations 


' 0 2 5 -1" 


"Xl ’ 


"-3" 

2 13 0 




3 

-2-1 3 1 




-2 

3 3-1 2_ 


X 4 


5_ 


9. ■ Solve the symmetric, tridiagonal equations 

4xi - x 2 = 9 

—+ 4 X/ — Xi + \ = 5, i = 2,..., n — 1 
x n —\ + 4x„ = 5 


with « = 10. 

10. ■ Solve the equations Ax = b, where 



'1.3174 

2.7250 

2.7250 

1.7181" 


'8.4855 


0.4002 

0.8278 

1.2272 

2.5322 


4.9874 

A = 

0.8218 

1.5608 

0.3629 

2.9210 

b = 

5.6665 


1.9664 

2.0011 

0.6532 

1.9945_ 


_6.6152_ 


11. ■ Solve the equations 


" 10 

-2 

-1 

2 

3 

1 

-4 

7’ 


Xi 


O” 

5 

11 

3 

10 

-3 

3 

3 

-4 


x 2 


12 

7 

12 

1 

5 

3 

-12 

2 

3 


x 3 


-5 

8 

7 

-2 

1 

3 

2 

2 

4 


x 4 


3 

2 

-15 

-1 

1 

4 

-1 

8 

3 


X 5 


-25 

4 

2 

9 

1 

12 

-1 

4 

1 


X B 


-26 

-1 

4 

-7 

-1 

1 

1 

-1 

-3 


X~I 


9 

-1 

3 

4 

1 

3 

-4 

7 

6 


_* 8 _ 


-7 


12. ■ The system shown in Fig. (a) consists of n linear springs that support n masses. 
The spring stiffnesses are denoted by fc,-, the weights of the masses are Wj, and 
Xj are the displacements of the masses (measured from the positions where the 
springs are undeformed). The so-called displacement formulation is obtained by 
writing the equilibrium equation of each mass and substituting Ft = kfXi+i — jq) 
for the spring forces. The result is the symmetric, tridiagonal set of equations 

(k\ + k 2 )x i - k 2 x 2 = Wi 

-kiXi-i + (ki + k i+ \)Xi - k i+ ix i+ i = Wi, i = 2, 3,..., n - 1 


k n Xn- I T k n X n — W, 
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Write a program that solves these equations for given values of n, k and W. Run 
the program with n = 5 and 

k 4 = k 2 = k 3 = 10 N/mm k 4 = k$ = 5 N/mm 
W 1 = W 3 = W 5 = 100 N W 2 = W 4 = 50 N 



13. ■ The displacement formulation for the mass-spring system shown in Fig. (b) 
results in the following equilibrium equations of the masses: 


k\ + k 2 + k 3 + ks —k 3 —k 3 


Xl 


" Wi" 

— k 3 k 3 + k 4 — k 4 


x 2 

= 

w 2 

-k 3 -lc A k 4 + k 3 


_ x 3_ 


_w 3 _ 


where kj are the spring stiffnesses, W t represent the weights of the masses, and 
X; are the displacements of the masses from the undeformed configuration of 
the system. Write a program that solves these equations, given k and W. Use the 
program to find the displacements if 

k 4 = k 3 = k 4 = k k 2 = ks = 2k 
W 4 = W 3 = 2W W 2 = W 



The displacement formulation for a plane truss is similar to that of a mass¬ 
spring system. The differences are: (1) the stiffnesses of the members are 
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kj = ( EA/L)i , where E is the modulus of elasticity, A represents the cross- 
sectional area and L is the length of the member; (2) there are two com¬ 
ponents of displacement at each joint. For the statically indeterminate truss 
shown the displacement formulation yields the symmetric equations Ku = p, 
where 


27.58 

7.004 

-7.004 

0.0000 

0 .0000' 


7.004 

29.57 

-5.253 

0.0000 

-24.32 


-7.004 

-5.253 

29.57 

0.0000 

0.0000 

MN/m 

0.0000 

0.0000 

0.0000 

27.58 

-7.004 


0.0000 

-24.32 

0.0000 

-7.004 

29.57. 



p = [o 0 0 0 -45] T kN 


Determine the displacements u,- of the joints. 

15. ■ 


12 kN 

In the force formulation of a truss, the unknowns are the member forces P,. For 
the statically determinate truss shown, the equilibrium equations of the joints 
are: 


"-1 

1 

- 1 /V 2 

0 

0 

o' 


>r 


o' 

0 

0 

1 /V 2 

1 

0 

0 


Pi 


18 

0 

-1 

0 

0 

- 1 /V 2 

0 


p 3 


0 

0 

0 

0 

0 

1 /V 2 

0 


P 4 


12 

0 

0 

0 

0 

1 /V 2 

1 


Ps 


0 

0 

0 

0 

-1 

- 1 /V 2 

0 


- p 6- 


0 


where the units of P, are kN. (a) Solve the equations as they are with a computer 
program, (b) Rearrange the rows and columns so as to obtain a lower triangular 
coefficient matrix, and then solve the equations by back substitution using a 
calculator. 
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16. ■ 



The force formulation of the symmetric truss shown results in the joint equilib¬ 
rium equations 


~c 1 0 0 O' 


~ Pi ~ 


"O' 

0 s 0 0 1 


P 2 


0 

0 0 2s 0 0 


P 3 

= 

1 

0 — c c 1 0 


Pa 


0 

O 

O 

05 

O 


-Ps_ 


_0_ 


where s = sin (9. c = cos 6 and P, are the unknown forces. Write a program that 
computes the forces, given the angle 9. Run the program with 0 = 53°. 



The electrical network shown can be viewed as consisting of three loops. Apply¬ 
ing Kirhoff’s law (^voltage drops = ^voltage sources) to each loop yields the 
following equations for the loop currents i \, 4 and i 3 : 

5 h + 15(z'i - z 3 ) = 220 V 

R(i'‘2 — 4 ) + 54 + 104 = 0 
204 + R[ 4 — 4 ) + 15(4 — 4 ) = 0 


Compute the three loop currents for R = 5, 10 and 20 Q. 
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18. ■ 



Determine the loop currents q to q in the electrical network shown. 

19. ■ Consider the n simultaneous equations Ax = b, where 

n— 1 

Aij = ( i + j ) 2 bi = ^2 Aij, i = 0,1,..., n — 1, j = 0,1,. .., n — 1 

l=o 


The solution is x = I 1 1 ••• 1J . Write a program that solves these equations 

for any given n (pivoting is recommended). Run the program with n= 2,3 and 4, 
and comment on the results. 


*2.6 Matrix Inversion 

Computing the inverse of a matrix and solving simultaneous equations are related 
tasks. The most economical way to invert an nxn matrix A is to solve the equations 

AX = I (2.33) 

where I is the nxn identity matrix. The solution X, also of size nxn, will be the 
inverse of A. The proof is simple: after we premultiply both sides of Eq. (2.33) by A -1 
we have A _1 AX = A _1 I, which reduces to X = A -1 . 

Inversion of large matrices should be avoided whenever possible due its high cost. 
As seen from Eq. (2.33), inversion of A is equivalent to solving Ax, = b,, i = 1,2,..., n, 
where b, is the z'th column of I. If LU decomposition is employed in the solution, the 
solution phase (forward and back substitution) must be repeated n times, once for 
each b,;. Since the cost of computation is proportional to n 3 for the decomposition 
phase and n 2 for each vector of the solution phase, the cost of inversion is considerably 
more expensive than the solution of Ax = b (single constant vector b). 
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Matrix inversion has another serious drawback—a banded matrix loses its struc¬ 
ture during inversion. In other words, if A is banded or otherwise sparse, then A -1 is 
fully populated. However, the inverse of a triangular matrix remains triangular. 


EXAMPLE 2.13 

Write a function that inverts a matrix using LU decomposition with pivoting. Test the 
function by inverting 


0.6 

-0.4 

1.0 

-0.3 

0.2 

0.5 

0.6 

-1.0 

0.5 


Solution The function matlnv listed below inverts any martix A. 


function Ainv = matlnv(A) 

% Inverts martix A with LU decomposition. 
% USAGE: Ainv = matlnv(A) 


n = size(A,1); 

Ainv = eye(n); % Store RHS vectors in Ainv. 

[A,perm] = LUdecPiv(A); % Decompose A. 

% Solve for each RHS vector and store results in Ainv 
% replacing the corresponding RHS vector, 
for i = l:n 

Ainv(:,i) = LUsolPiv(A,Ainv(:,i),perm); 

end 


The following test program computes the inverse of the given matrix and checks 
whether AA -1 = I: 

% Example 2.13 (Matrix inversion) 

A = [0.6 -0.4 1.0 
-0.3 0.20.5 

0.6 -1.0 0.5] ; 

Ainv = matlnv(A) 
check = A*Ainv 

Here are the results: 

» Ainv = 

1.6667 -2.2222 -1.1111 

1.2500 -0.8333 -1.6667 

0.5000 1.0000 0 
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check = 

1.0000 - 0.0000 
0 1.0000 
0 - 0.0000 

EXAMPLE 2.14 

Invert the matrix 


A 


-0.0000 





0.0000 





1.0000 





2 

-l 

0 

0 

0 

0 

-1 

2 

-1 

0 

0 

0 

0 

-1 

2 

-1 

0 

0 

0 

0 

-1 

2 

-1 

0 

0 

0 

0 

-1 

2 

-1 

0 

0 

0 

0 

-1 

5 


Solution Since the matrix is tridiagonal, we solve AX = I using the functions LUdec 3 
and LUsol3 (LU decomposition for tridiagonal matrices): 

% Example 2.14 (Matrix inversion 


n = 6; 

d = ones(n,l)*2; 
e = -ones(n-l,1) ; 
c = e ; 
d(n) = 5; 

[c,d,e] = LUdec3(c,d,e); 
for i = l:n 

b = zeros(n,1); 
b(i) = 1; 

Ainv(:,i) = LUsol3(c,d,e,b) 

end 

Ainv 

The result is 

» Ainv = 


0.8400 

0.6800 

0.5200 

0.6800 

1.3600 

1.0400 

0.5200 

1.0400 

1.5600 

0.3600 

0.7200 

1.0800 

0.2000 

0.4000 

0.6000 

0.0400 

0.0800 

0.1200 


Note that although A is tridiagonal, A 1 


) 


0.3600 

0.2000 

0.0400 

0.7200 

0.4000 

0.0800 

1.0800 

0.6000 

0.1200 

1.4400 

0.8000 

0.1600 

0.8000 

1.0000 

0.2000 

0.1600 

0.2000 

0.2400 


is fully populated. 






84 


Systems of Linear Algebraic Equations 


*2.7 Iterative Methods 
Introduction 

So far, we have discussed only direct methods of solution. The common characteristic 
of these methods is that they compute the solution with a finite number of operations. 
Moreover, if the computer were capable of infinite precision (no roundoff errors), the 
solution would be exact. 

Iterative, or indirect methods, start with an initial guess of the solution x and 
then repeatedly improve the solution until the change in x becomes negligible. Since 
the required number of iterations can be very large, the indirect methods are, in 
general, slower than their direct counterparts, ffowever, iterative methods do have 
the following advantages that make them attractive for certain problems: 

1. It is feasible to store only the nonzero elements of the coefficient matrix. This 
makes it possible to deal with very large matrices that are sparse, but not neces¬ 
sarily banded. In many problems, there is no need to store the coefficient matrix 
at all. 

2. Iterative procedures are self-correcting, meaning that roundoff errors (or even 
arithmetic mistakes) in one iterative cycle are corrected in subsequent cycles. 

A serious drawback of iterative methods is that they do not always converge to the 
solution. It can be shown that convergence is guaranteed only if the coefficient matrix 
is diagonally dominant. The initial guess for x plays no role in determining whether 
convergence takes place—if the procedure converges for one starting vector, it would 
do so for any starting vector. The initial guess affects only the number of iterations 
that are required for convergence. 

Gauss-Seidel Method 

The equations Ax = b are in scalar notation 

rn 

Y\ AjXj = bj, i = 1 , 2 ,..., n 

j =i 

Extracting the term containing jq from the summation sign yields 

n 

AuXi + Y A >i x i = l, i< * = 1,2. n 

i= i 
j +< 
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Solving for jq, we get 


/ 


Xi = 


bt ~ £ A 




, i = 1, 2,..., n 


\ f# / 


The last equation suggests the following iterative scheme 

( „ \ 


1 

Ai 


hi ^ AijXj 


\ 


j =i 


, i = 1,2,..., n 


(2.34) 




We start by choosing the starting vector x. If a good guess for the solution is not 
available, x can be chosen randomly. Equation (2.34) is then used to recompute each 
element of x, always using the latest available values of xj. This completes one iteration 
cycle. The procedure is repeated until the changes in x between successive iteration 
cycles become sufficiently small. 

Convergence of the Gauss-Seidel method can be improved by a technique known 
as relaxation. The idea is to take the new value of x ,■ as a weighted average of its previous 
value and the value predicted by Eq. (2.34). The corresponding iterative formula is 


(w 



/ 


bt~J2 A ‘ 




\ 


7=1 

7# 


+ (1 — a>)Xi, 


i = 1,2.... ,n 


(2.35) 


where the weight ft) is called the relaxation factor. It can be seen that if a> = 1, 
no relaxation takes place, since Eqs. (2.34) and (2.35) produce the same result. If 
a) < 1, Eq. (2.35) represents interpolation between the old x ; and the value given by 
Eq. (2.34). This is called underrelaxation. In cases where a> > 1, we have extrapolation, 
or overrelaxation. 

There is no practical method of determining the optimal value of &> beforehand; 
however, a good estimate can be computed during run time. Let A;c (fc) = |x (fc-1) — x {fc) | 
be the magnitude of the change in x during the /cth iteration (carried out without 
relaxation; i.e., with o> = 1). If L is sufficiently large (say k > 5), it can be shown 2 that 
an approximation of the optimal value of o> is 

(o opt ~- , — (2.36) 

1 + ^ 1 - (Axl k +P)/AxW) 1/p 

where p is a positive integer. 


2 See, for example, Terrence I. Akai, Applied Numerical Methods for Engineers, John Wiley & Sons 
(1994), p. 100. 
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The essential elements of a Gauss-Seidel algorithm with relaxation are: 

1. Carry out k iterations with co = 1 (k = 10 is reasonable). After the k th iteration 
record Ajc (fc) . 

2. Perform an additional p iterations (p > 1) and record Ax (k+p> after the last 
iteration. 

3. Perform all subsequent iterations with co = u> opt , where co opt is computed from 
Eq. (2.36). 


■ gaussSeidel 

The function gaussSeidel is an implementation of the Gauss-Seidel method with 
relaxation. It automatically computes a> op t from Eq. (2.36) using A; = 10 and p = 1. 
The user must provide the function iterEqs that computes the improved x from the 
iterative formulas in Eq. (2.35)—see Example 2.17. 


function [x,numlter,omega] = gaussSeidel(func,x,maxlter,epsilon) 
% Solves Ax = b by Gauss-Seidel method with relaxation. 

% USAGE: [x,numlter,omega] = gaussSeidel(func,x,maxlter,epsilon) 
% INPUT: 

% func = handle of function that returns improved x using 
% the iterative formulas in Eq. (2.35). 

% x = starting solution vector 

% maxlter = allowable number of iterations (default is 500) 

% epsilon = error tolerance (default is 1.0e-9) 

% OUTPUT: 

% x = solution vector 

% numlter = number of iterations carried out 
% omega = computed relaxation factor 


if nargin < 4; epsilon = 1.0e-9; end 
if nargin < 3; maxlter = 500; end 
k = 10; p = 1; omega = 1; 
for numlter = 1:maxlter 
xOld = x; 

x = feval(func,x,omega); 
dx = sqrt(dot(x - xOld,x - xOld)); 
if dx < epsilon; return; end 
if numlter == k; dxl = dx; end 
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if numlter == k + p 

omega = 2/(1 + sqrt(l - (dx/dxl)"(1/p))); 

end 

end 

error('Too many iterations’) 


Conjugate Gradient Method 

Consider the problem of finding the vector x that minimizes the scalar function 

fix) = ix r Ax — b r x (2.37) 

where the matrix A is symmetric and positive definite. Because fix ) is minimized when 
its gradient V / = Ax — b is zero, we see that minimization is equivalent to solving 

Ax = b (2.38) 


Gradient methods accomplish the minimization by iteration, starting with an 
initial vector xo. Each iterative cycle k computes a refined solution 

rd;+i = x fc + &kSk (2.39) 

The step length u k is chosen so that x^+i minimizes fix k+ i) in the search direction s k - 
That is, xj; + i must satisfy Eq. (2.38): 

A(x fc + a k s k ) = b (a) 


Introducing the residual 


r fc = b - Ax*; 


(2.40) 


Eq. (a) becomes ffAs,t 
obtain 


r k . Premultiplying both sides by s l and solving for a k , we 


u k = 


s l r k 

s[As k 


(2.41) 


We are still left with the problem of determining the search direction s k . Intuition 
tells us to choose = — V/ = r*, since this is the direction of the largest negative 
change in fix). The resulting procedure is known as the method of steepest descent. 
It is not a popular algorithm due to slow convergence. The more efficient conjugate 
gradient method uses the search direction 


Sfc +1 — Tfc + i + /3 k s k (2.42) 

The constant p k i s chosen so that the two successive search directions are conjugate 
(noninterfering) to each other, meaning s^ +1 Asfc = 0. Substituting for s^ +i from 
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Eq. (2.42), we get (r£ +1 + which yields 

r t +1 ASfc 

Pk s^Asfc 

Here is the outline of the conjugate gradient algorithm: 


(2.43) 


• Choose xo (any vector will do, but one close to solution results in fewer iterations) 

• r 0 «- b — Axo 

• s 0 <— r 0 (lacking a previous search direction, choose the direction of steepest 
descent) 

• do with k = 0,1,2 ,... 


<- 


s t r t 
s[A s fc 


Xfc+i <- x* + afcSfc 

rjt+i <- b - Axjt+i 

if Iffc+il < e exit loop (convergence criterion; e is the error tolerance) 

r fc + iAs fc 

Pk s^Asj.. 

Sfc+l •<— ffc + i + /3 fc Sfc 


• end do 


It can be shown that the residual vectors ri, r 2 , r 3 ,... produced by the algorithm 
are mutually orthogonal; i.e., r, • r j = 0, i / j. Now suppose that we have carried out 
enough iterations to have computed the whole set of n residual vectors. The residual 
resulting from the next iteration must be a null vector (r n+ i = 0), indicating that the 
solution has been obtained. It thus appears that the conjugate gradient algorithm 
is not an iterative method at all, since it reaches the exact solution after n compu¬ 
tational cycles. In practice, however, convergence is usually achieved in less than n 
iterations. 

The conjugate gradient method is not competitive with direct methods in the 
solution of small sets of equations. Its strength lies in the handling of large, sparse 
systems (where most elements of A are zero). It is important to note that A enters the 
algorithm only through its multiplication by a vector; i.e., in the form Av, where v is 
a vector (either x^+i or s*). If A is sparse, it is possible to write an efficient subroutine 
for the multiplication and pass it on to the conjugate gradient algorithm. 


■ conjGrad 

The function con j Grad shown below implements the conjugate gradient algorithm. 
The maximum allowable number of iterations is set to n. Note that conjGrad calls 
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the function Av( v) which returns the product Av. This function must be supplied by 
the user (see Example 2.18). We must also supply the starting vector x and the constant 
(right-hand-side) vector b. 

function [x,numlter] = conjGrad(func,x,b,epsilon) 

% Solves Ax = b by conjugate gradient method. 

% USAGE: [x,numlter] = conjGrad(func,x,b,epsilon) 

% INPUT: 

% func = handle of function that returns the vector A*v 
% x = starting solution vector 

% b = constant vector in A*x = b 

% epsilon = error tolerance (default = 1.0e-9) 

% OUTPUT: 

% x = solution vector 

% numlter = number of iterations carried out 


if nargin == 3; epsilon = 1.0e-9; end 
n = length(b); 

r = b - fevalffunc,x); s = r; 
for numlter = l:n 

u = fevalffunc,s); 
alpha = dot(s,r)/dot(s,u); 
x = x + alpha*s; 
r = b - fevalffunc,x); 
if sqrt(dot(r,r)) < epsilon 
return 

else 

beta = -dot(r,u)/dotfs,u); 
s = r + beta*s; 

end 

end 

errorf'Too many iterations') 

EXAMPLE 2.15 

Solve the equations 


4-1 l" 


Xi 


1 

CM 

i-H 

1 _ 

-1 4 -2 


X 2 

= 

-1 

1 

CM 

1 

i—1 

_1 


_ x 3 _ 


5 


by the Gauss-Seidel method without relaxation. 
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Solution With the given data, the iteration formulas in Eq. (2.34) become 

1 

xi = - (12 + x 2 - x 3 ) 

X 2 — — (— 1 + Xi + 2 X 3 ) 

1 

X 3 = - (5 - Xi + 2 x 2 ) 

Choosing the starting values xi = x 2 = x 3 = 0, we have for the first iteration 

1 

X! = - (12 + 0 - 0) = 3 
4 

1 

x 2 — — [—1 —|— 3 —|— 2(0)] — 0.5 


x 3 = - [5 - 3 + 2(0.5)] = 0.75 


The second iteration yields 


Xi = i (12 + 0.5 - 0.75) = 2.9375 


x 2 = - [-1 + 2.9375 + 2(0.75)] = 0.859 38 
4 

1 

x 3 = - [5 - 2.9375 + 2(0.85938)] = 0 .945 31 
and the third iteration results in 

Xi = i (12 + 0.85938 - 0 .94531) = 2.978 52 

1 

x 2 = — [-1 + 2.97852 + 2(0 .94531)] = 0.967 29 
4 

x 3 = i [5 - 2.97852 + 2(0.96729)] = 0.989 02 

After five more iterations the results would agree with the exact solution X\ = 3, 
x 2 = x 3 = 1 within five decimal places. 

EXAMPLE 2.16 

Solve the equations in Example 2.15 by the conjugate gradient method. 


Solution The conjugate gradient method should converge after three iterations. 
Choosing again for the starting vector 


Xo = [o 0 o] 3 
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the computations outlined in the text proceed as follows: 


II 

I 

1 

.a 

II 

O 

U 

" 12 ' 
-1 


4 

-1 

-1 

4 

r 

-2 


1 

o o 
_ 1 

_ 

" 12 " 
-1 


5 


1 

-2 

4 


0 


5 


s 0 = r 0 


12 

-1 

5 


4-1 l" 


" 12" 


54" 

-1 4 -2 


-1 

= 

-26 

1 -2 4 


5 


34 


a 0 


s ([ r o 

SqASq 


12 2 + (— l) 2 + 5 2 
12(54) + (—!)(—26) + 5(34) 


0.20142 


"o" 


" 12" 


2.41704" 

0 

+ 0.20142 

-1 

= 

-0.20142 

0 


5 


1.00710 


" 12" 


4-1 r 


2.41704" 


1.123 32" 

-1 

- 

-1 4 -2 


-0.20142 

= 

4.23692 

5 


1 -2 4 


1.00710 


-1.848 28 


Po 


rfAso 

SqAsq 


1.123 32(54) + 4.236 92(—26) - 1.84828(34) 
12(54) + (-1) (-26)+ 5(34) 


0.133107 


si = rj + jSqSo 


1.123 32" 


" 12" 


2.720 76" 

4.23692 

+ 0.133107 

-1 

= 

4.103 80 

-1.848 28 


5 


-1.182 68 


4-1 l" 


2.720 76" 


5.59656" 

1 

H 

4^ 

1 

K3 


4.103 80 

= 

16.059 80 

1 -2 4 


-1.182 68 


-10.21760 


s^Asi 

2.72076(1.123 32) + 4.103 80(4.23692) + (-1.182 68)(-1.84828) 
2.720 76(5.596 56) + 4.103 80(16.059 80) + (-1.182 68) (-10.217 60) 


= 0.24276 
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2.41704’ 


2.720 76’ 


’ 3.07753 ’ 

-0.20142 

+ 0.24276 

4.103 80 

= 

0.79482 

1.00710 


-1.182 68 


0.71999 


’ 12’ 


'4-1 r 


’ 3.07753 ’ 


’-0.23529’ 

-1 

- 

-1 4 -2 


0.79482 

= 

0.33823 

5 


1 

CM 

1 

i-H 

_1 


0.71999 


0.63215 


rjAsi 
Pi = — 

s^Asi 

(-0.23529) (5.59656) + 0.33823(16.05980) + 0.63215(-10.21760) 
~ ”2.72076(5.59656) + 4.10380(16.05980) + (-1.18268)(-10.21760) 

= 0.0251452 


’-0.23529’ 


2.72076’ 


"-0.166876’ 

0.33823 

+ 0.0251452 

4.10380 

= 

0.441421 

0.63215 


-1.18268 


0.602411 


4-1 l" 


’-0.166876’ 


’-0.506514’ 

-1 4 -2 


0.441421 

= 

0.727738 

1 

CM 

1 

i—i 

_1 


0.602411 


1.359930 


T 

= 

s[As 2 

(-0.23529) (-0.166876) + 0.33823(0.441421) + 0.63215(0.602411) 

~ (—0.166876)(—0.506514) + 0.441421(0.727738) + 0.602411(1.359930) 
= 0.46480 


x 3 = x 2 + a 2 S 2 = 

’ 3.07753 ’ 
0.79482 

+ 0.46480 

’-0.166876’ 

0.441421 

_ 

’2.99997’ 

0.99999 


0.71999 


0.602411 


0.99999 


The solution x 3 is correct to almost five decimal places. The small discrepancy is 
caused by roundoff errors in the computations. 

EXAMPLE 2.17 

Write a computer program to solve the following n simultaneous equations 3 by 
the Gauss-Seidel method with relaxation (the program should work with any 

3 Equations of this form are called cyclic tridiagonal. They occur in the finite difference formulation 
of second-order differential equations with periodic boundary conditions. 
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value of n ): 


2 

-1 

0 

0 .. 

0 

0 

0 

r 




"0" 

-1 

2 

-1 

0 .. 

0 

0 

0 

0 


*2 


0 

0 

-1 

2 

-1 .. 

0 

0 

0 

0 


*3 


0 

0 

0 

0 

0 .. 

. -1 

2 

-1 

0 


Xn- 2 


0 

0 

0 

0 

0 .. 

0 

-1 

2 

-1 


Xn—l 


0 

1 

0 

0 

0 .. 

0 

0 

-1 

2_ 




_ 1 _ 


Run the program with n = 20. The exact solution can be shown to be x ,• = — n/ 4 + i/2, 
i = 1,2.n. 

Solution In this case the iterative formulas in Eq. (2.35) are 
Xi = a>{x 2 - x n )/2 + (1 - co)Xi 

xi = co(Xi-\ + Xi+ 1)/2 + (1 - co)Xi, i = 2, 3,..., n - 1 (a) 

X n = tt>( 1 -XI + Xn-l)/2 + (1 - (0)X„ 

which are evaluated by the following function: 

function x = fex2_17(x,omega) 

% Iteration formula Eq. (2.35) for Example 2.17. 


n = length(x); 

x(l) = omega*(x(2) - x(n))/2 + (1-omega)*x(l); 
for i = 2:n-1 

x(i) = omega*(x(i-l) + x(i+l))/2 + (1-omega)*x(i); 

end 

x(n) = omega *(1 - x(l) + x(n-l))/2 + (1-omega)*x(n); 

The solution can be obtained with a single command (note that x = 0 is the 
starting vector): 

» [x,numlter,omega] = gaussSeidel(®fex2_17,zeros(20,1)) 

resulting in 

x = 

-4.5000 

-4.0000 

-3.5000 

-3.0000 










94 


Systems of Linear Algebraic Equations 


-2.5000 
-2.0000 
-1.5000 
-1.0000 
-0.5000 
0.0000 
0.5000 
1.0000 

1.5000 
2.0000 

2.5000 
3.0000 

3.5000 
4.0000 

4.5000 
5.0000 

numlter = 

259 

omega = 

1.7055 

The convergence is very slow, because the coefficient matrix lacks diagonal 
dominance—substituting the elements of A in Eq. (2.30) produces an equality rather 
than the desired inequality. If we were to change each diagonal term of the coefficient 
matrix from 2 to 4, A would be diagonally dominant and the solution would converge 
in only 22 iterations. 

EXAMPLE 2.18 

Solve Example 2.17 with the conjugate gradient method, also using n = 20. 

Solution For the given A, the components of the vector Av are 
(Av)i = 2vi -v 2 + v n 

(Av), = -i/;_ i + 2 Vi -v i+ i, i = 2,3,...,n-l 
(Av)„ = -iv i + 2v n + v 1 
which are evaluated by the following function: 

function Av = fex2_18(v) 

% Computes the product A*v in Example 2.18 
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n = length(v); 

Av = zeros(n,1); 

Av(l) = 2*v(l) - v(2) + v(n) ; 

Av(2:n-1) = -v(l:n-2) + 2*v(2:n-l) - v(3:n); 

Av(n) = -v(n-l) + 2*v(n) + v(l); 

The program shown belowutilizes the function conjGrad. The solution vector x 
is initialized to zero in the program, which also sets up the constant vector b. 

% Example 2.18 (Conjugate gradient method) 
n = 20; 

x = zeros(n,1); 
b = zeros(n,l); b(n) = 1; 

[x.numlter] = conjGrad(@fex2_18,x,b) 

Running the program results in 


-4.5000 
-4.0000 
-3.5000 
-3.0000 
-2.5000 
- 2.0000 
-1.5000 
- 1.0000 
-0.5000 
0 

0.5000 

1.0000 

1.5000 
2.0000 

2.5000 
3.0000 

3.5000 
4.0000 

4.5000 
5.0000 


numlter 

10 
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PROBLEM SET 2.3 

1. Let 


3 

-1 

2~ 


0 

1 

3~ 

0 

1 

3 

B = 

3 

-1 

2 

-2 

2 

-4 


-2 

2 

-4 


(note that B is obtained by interchanging the first two rows of A). Knowing that 


A- x = 


0.5 

0 

0.25 

0.3 

0.4 

0.45 

-0.1 

0.2 

-0.15 


determine B 1 . 

2. Invert the triangular matrices 


A = 


3. Invert the triangular matrix 


A = 


4. Invert the following matrices: 


"2 

4 

3" 


"2 

0 

o' 

0 

6 

5 

B = 

3 

4 

0 

0 

0 

2 


4 

5 

6 


1/2 1/4 1/8 

1 
0 
0 


1/3 1/9 
1 1/4 

0 1 



"l 

2 

4' 


4 

-1 

o' 

(a) A = 

1 

3 

9 

(b) B = 

-1 

4 

-1 


1 

4 

16 


0 

-1 

4 


5. Invert the matrix 


A = 


4 -2 1 

-2 1 -1 
1 -2 4 


6. ■ Invert the following matrices with any method: 



5 

-3 

-1 

O' 


4 

-1 

0 

O' 


-2 

1 

1 

1 


-1 

4 

-1 

0 

A = 

3 

-5 

1 

2 

B = 

0 

-1 

4 

-1 


0 

8 

-4 

-3 


0 

0 

-1 

4 _ 



























97 


2.7 Iterative Methods 


7. ■ Invert the matrix with any method; 



' 1 

3 

-9 

6 

4 


2 

-1 

6 

7 

1 

A = 

3 

2 

-3 

15 

5 


8 

-1 

1 

4 

2 


11 

1 

-2 

18 

7 


and comment on the reliability of the result. 

8. ■ The joint displacements u of the plane truss in Prob. 14, Problem Set 2.2 are 
related to the applied joint forces p by 

Ku = p (a) 

where 


27.580 

7.004 

-7.004 

0.000 

0.000' 


7.004 

29.570 

-5.253 

0.000 

-24.320 


-7.004 

-5.253 

29.570 

0.000 

0.000 

MN/m 

0.000 

0.000 

0.000 

27.580 

-7.004 


0.000 

-24.320 

0.000 

-7.004 

29.570. 



is called the stiffness matrix of the truss. If Eq. (a) is inverted by multiplying each 
side by K -1 , we obtain u = K -1 p, where K -1 is known as the flexibility matrix. The 
physical meaning of the elements of the flexibility matrix is: Kf. 1 = displacements 
Uj (/ = 1, 2,... 5) produced by the unit load pj = 1. Compute (a) the flexibility 
matrix of the truss; (b) the displacements ofthe joints due to the load p 5 = —45kN 
(the load shown in Problem 14, Problem Set 2.2). 

9. ■ Invert the matrices 



3 

-7 

45 

21" 


'1 

1 

1 

1" 


12 

11 

10 

17 


1 

2 

2 

2 

A = 

6 

25 

-80 

-24 

B = 

2 

3 

4 

4 


_!7 

55 

-9 

7_ 


_4 

5 

6 

7_ 


10. ■ Write a program for inverting a nx n lower triangular matrix. The inversion 
procedure should contain only forward substitution. Test the program by invert¬ 
ing the matrix 

'36 0 0 O' 

A _ 18 36 0 0 

9 12 36 0 

5 4 9 36 _ 

Let the program also check the result by computing and printing AA 1 . 
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11. Use the Gauss-Seidel method to solve 


” —2 5 9’ 


Xi 


r 

7 1 1 


x 2 

= 

6 

1 

1 

GJ 

1 

i— 1 

1 _ 


_X 3 _ 


-26 


12. Solve the following equations with the Gauss-Seidel method: 


'12-2 3 r 


~ Xi~ 


O' 

-2 15 6 -3 


x 2 


0 

1 6 20 -4 


X 3 


20 

0-32 9_ 


X 4 


0 _ 


13. Use the Gauss-Seidel method with relaxation to solve Ax = b, where 


4 

-1 

0 

O' 


"15" 

-1 

4 

-1 

0 

b = 

10 

0 

-1 

4 

-1 


10 

0 

0 

-1 

3_ 


_10_ 


Take x, = bj An as the starting vector and use o> = 1.1 for the relaxation factor. 

14. Solve the equations 


2-1 O' 


Xl 


"l" 

-1 2 -1 


x 2 

= 

1 

0 -1 1 


_ x 3 _ 


1 


by the conjugate gradient method. Start with x = 0. 

15. Use the conjugate gradient method to solve 


3 0 -l" 


Xl 


4’ 

0 4-2 


x 2 

= 

10 

-1 -2 5 


_X 3 _ 


-10 


starting with x = 0. 

16. ■ Solve the simultaneous equations Ax = b and Bx = b by the Gauss-Seidel 
method with relaxation, where 

b=[l0 -8 10 10 -8 10 
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3 

-2 

1 

0 

0 

0 

2 

4 

-2 

1 

0 

0 

1 

-2 

4 

-2 

1 

0 

0 

1 

-2 

4 

-2 

1 

0 

0 

1 

-2 

4 

-2 

0 

0 

0 

1 

-2 

3 

3 

-2 

1 

0 

0 

1 

2 

4 

-2 

1 

0 

0 

1 

-2 

4 

-2 

1 

0 

0 

1 

-2 

4 

-2 

1 

0 

0 

1 

-2 

4 

-2 

1 

0 

0 

1 

-2 

3 


Note that A is not diagonally dominant, but that does not necessarily preclude 
convergence. 

17. ■ Modify the program in Example 2.17 (Gauss-Seidel method) so that it will solve 
the following equations: 


4 

-1 

0 

0 

0 

0 

0 

r 


Xi 


0 

-1 

4 

-1 

0 

0 

0 

0 

0 


*2 


0 

0 

-1 

4 

-1 

0 

0 

0 

0 


*3 


0 

0 

0 

0 

0 

■ -1 

4 

-1 

0 


%n—2 


0 

0 

0 

0 

0 

0 

-1 

4 

-i 


•^72—1 


0 

1 

0 

0 

0 

0 

0 

-1 

4_ 


-X-72 


_100_ 


Run the program with n = 20 and compare the number of iterations with Example 
2.17. 

18. ■ Modify the program in Example 2.18 to solve the equations in Prob. 17 by the 
conjugate gradient method. Run the program with n = 20. 


7 = 0 ° 


1 

2 

3 


4 

5 

6 


7 

8 

9 







r= ioo° 


7 = 200 ( 
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Systems of Linear Algebraic Equations 


The edges of the square plate are kept at the temperatures shown. Assuming 
steady-state heat conduction, the differential equation governing the temperature 
T in the interior is 


d 2 T d 2 T _ 

9x2 + ^2 =° 

If this equation is approximated by finite differences using the mesh shown, 
we obtain the following algebraic equations for temperatures at the mesh 
points: 


~ —4 

1 

0 

1 

0 

0 

0 

0 

o' 


"2i" 


o' 

1 

-4 

1 

0 

1 

0 

0 

0 

0 


t 2 


0 

0 

1 

-4 

0 

0 

1 

0 

0 

0 


t 3 


100 

1 

0 

0 

-4 

1 

0 

1 

0 

0 


t 4 


0 

0 

1 

0 

1 

-4 

1 

0 

1 

0 


t 5 

= - 

0 

0 

0 

1 

0 

1 

-4 

0 

0 

1 


t 6 


100 

0 

0 

0 

1 

0 

0 

-4 

1 

0 


t 7 


200 

0 

0 

0 

0 

1 

0 

1 

-4 

1 


T a 


200 

0 

0 

0 

0 

0 

1 

0 

1 

-4 


_T 9 _ 


_300_ 


Solve these equations with the conjugate gradient method. 


MATLAB Functions 

x = A\b returns the solution x of Ax = b, obtained by Gauss elimination. If the equa¬ 
tions are overdetermined (A has more rows than columns), the least-squares 
solution is computed. 

[L , U] = lu(A) Doolittle’s decomposition A = LU. On return, u is an upper trian¬ 
gular matrix and L contains a row-wise permutation of the lower triangular 
matrix. 

[M, u, P] = lu(A) returns the same u as above, but nowM is a lower triangular matrix 
and P is the permutation matrix so that M = P*L. Note that here p*a = M*u. 

L = chol (A) Choleski’s decomposition A = LL r . 

B = inv(A) returns B as the inverse of A (the method used is not specified). 

n = norm (A, l) returns the norm n = max,- I Ay I (largest sum of elements in a 
column of A). 

c = cond (A) returns the condition number of the matrix A. 
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MATLAB does not cater to banded matrices explicitly. However, banded matrices 
can be treated as a sparse matrices for which MATLAB provides extensive support. A 
banded matrix in sparse form can be created by the following command: 


A = spdiags (B, d, n, n) creates a nx n sparse matrix from the columns of matrix 
B by placing the columns along the diagonals specified by d. The columns of B 
may be longer than the diagonals they represent. A diagonal in the upper part 
of A takes its elements from lower part of a column of B, while a lower diagonal 
uses the upper part of B. 


Here is an example of creating the 5x5 tridiagonal matrix 


A = 


2 

-1 

0 

0 

0 


-1 

2 

-1 

0 

0 


0 

-1 

2 

-1 

0 


0 0 “ 

0 0 

-1 0 

2 -1 
-1 2 


» c = ones(5,1); 

» A = spdiags([-c 2*c -c],[-l 0 1],5,5) 
A = 


(1,1) 

2 

(2,1) 

-1 

(1,2) 

-1 

(2,2) 

2 

(3,2) 

-1 

(2,3) 

-1 

(3,3) 

2 

(4,3) 

-1 

(3,4) 

-1 

(4,4) 

2 

(5,4) 

-1 

(4,5) 

-1 

(5,5) 

2 


If the matrix is declared sparse, MATLAB stores only the nonzero elements of the 
matrix together with information locating the position of each element in the matrix. 
The printout of a sparse matrix displays the values of these elements and their indices 
(row and column numbers) in parentheses. 

Almost all matrix functions, including the ones listed above, also work on sparse 
matrices. For example, [L,U] = lu(A) would return L and u in sparse matrix 
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Systems of Linear Algebraic Equations 


representation if A is a sparse matrix. There are many sparse matrix functions in 
MATLAB; here are just a few of them: 

A = full (S) converts the sparse matrix s into a full matrix A. 

S = sparse (A) converts the full matrix A into a sparse matrix s. 
x = lsqr(A,b) conjugate gradient method for solving Ax = b. 
spy (S) draws a map of the nonzero elements of s. 




Interpolation and Curve Fitting 


Given the n data points y ; ), i = 1,2,..., n, estimate y(x). 


3.1 Introduction 

Discrete data sets, or tables of the form 


Xi 

x 2 

*3 



yi 

T2 

ys 


yn 


are commonly involved in technical calculations. The source of the data may be ex¬ 
perimental observations or numerical computations. There is a distinction between 
interpolation and curve fitting. In interpolation we construct a curve through the data 
points. In doing so, we make the implicit assumption that the data points are accurate 
and distinct. Curve fitting is applied to data that contain scatter (noise), usually due to 
measurement errors. Here we want to find a smooth curve that approximates the data 
in some sense. Thus the curve does not have to hit the data points. This difference 
between interpolation and curve fitting is illustrated in Fig. 3.1. 


3.2 Polynomial Interpolation 

Lagrange's Method 

The simplest form of an interpolant is a polynomial. It is always possible to construct a 
unique polynomial P„_i {x) of degree n— 1 that passes through n distinct data points. 
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Interpolation and Curve Fitting 



One means of obtaining this polynomial is the formula of Lagrange 

n 

Pn-dx) = (3.1a) 

i =1 


where 


li W 


X — X\ X 

— x 2 

x - X;_i 

X - Xj+1 

X - x n 

Xi- X 1 Xi 

— x 2 

Xi - Xi -1 

Xi - Xj+1 

Xi - x n 

^ x-Xj 

i = ] 

1,2,... ,n 




(3.1b) 


are called the cardinal functions. 

For example, if n = 2, the interpolant is the straight line P\ (x) = y \ t \ (x) + y 2 i 2 lx), 
where 


(■dx) = 


x-x 2 


hlx) = 


X— X\ 


Xi — X 2 x 2 — Xi 

WiLh n = 3, interpolation is parabolic: P 2 (x) = y\l\[x) + y 2 l 2 (x) + y 3 f 3 (x), where now 

(x-x 2 )(x-x 3 ) 


hlx) = 
tilx) = 
hlx) = 


lx 1 - X 2 )(Xi - x 3 ) 

(X-X1KX-X3) 

(x 2 - Xi)(x 2 - x 3 ) 
(x - Xi)(X- x 2 ) 


(x 3 -Xi)(X 3 -x 2 ) 

The cardinal functions are polynomials of degree n— 1 and have the property 


ti lXj) = 


0 if 1 # j 
1 if i = j 


= S< 


(3.2) 


where (5y is the Kronecker delta. This property is illustrated in Fig. 3.2 for three-point 
interpolation In = 3) with x\ = 0, x 2 = 2 and x 3 = 3. 
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3.2 Polynomial Interpolation 



Figure 3.2. Example of quadratic cardinal functions. 


To prove that the interpolating polynomial passes through the data points, we 
substitute x = Xj into Eq. (3.1a) and then utilize Eq. (3.2). The result is 

n n 

Pn- 1 (Xj) = y yjijixj) = y ' yi&ij = >7 

!=1 1=1 


It can be shown that the error in polynomial interpolation is 


/M - P n - t(x) = 


(X-Xi)(X-X 2 )...(X-X„) ( „, 

n\ 1 ? 


(3.3) 


where £ lies somewhere in the interval (jci, x n ); its value is otherwise unknown. It is 
instructive to note that the farther a data point is from x, the more it contributes to 
the error at x. 


Newton's Method 

Evaluation of polynomial 

Although Lagrange’s method is conceptually simple, it does not lend itself to an effi¬ 
cient algorithm. A better computational procedure is obtained with Newton’s method, 
where the interpolating polynomial is written in the form 


fVi (x) = a\ + (x - xi)a 2 + (x - xi)(x - x 2 )a 3 -I- 1 - (x — xi)(x — x 2 ) • • • (x — x„_i )a n 

This polynomial lends itself to an efficient evaluation procedure. Consider, for 
example, four data points [n = 4). Here the interpolating polynomial is 

# 3 (x) = + (x - Xi)a 2 + (x — xi)(x — x 2 )a 3 + (x — xi)(x - x 2 )(x — x 3 )a 4 

= ai + (x - Xi) (a 2 + (x - x 2 ) [a 3 + (x - x 3 )a 4 ]} 
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which can be evaluated backward with the following recurrence relations: 

Po(x) = a A 

Pi (x) = a 3 + {x- x 3 )P 0 (x) 

P 2 (x) = a 2 + (x - x 2 ) P\ (x) 

P 3 (x) = ai + {x - x i )P 2 {x) 

For arbitrary zzwe have 

Po(x) = a n P fc (x) = a,,-* + (x - x„_ fc )P fc _i(x), k = 1, 2,..., n- 1 (3.4) 


■ newtonPoly 

Denoting the x-coordinate array of the data points by xDat a, and the number of data 
points by n, we have the following algorithm for computing P„_i (x): 

function p = newtonPoly(a,xData,x) 

% Returns value of Newton’s polynomial at x. 

% USAGE: p = newtonPoly(a,xData,x) 

% a = coefficient array of the polynomial; 

% must be computed first by newtonCoeff. 

% xData = x-coordinates of data points. 

n = length(xData); 

p = a(n); 

for k = l:n-l; 

p = a(n-k) + (x - xData(n-k))*p; 

end 

Computation of coefficients 

The coefficients of P„_i (x) are determined by forcing the polynomial to pass through 
each data point: yt = P„_i(x,), z = 1,2,..., n. This yields the simultaneous equations 

y i = «i 

y 2 = a\ + ix 2 - xi )a 2 

y 3 = «1 + (X 3 - Xi)a 2 + (X 3 - Xi)(x 3 - x 2 )a 3 (a) 


y n = a i + (x„ - x 1 )a 1 H-b (x„ - xi)(x„ - x 2 ) • • • (x„ - x„_i )a, 
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Introducing the divided differences 

yt - yi 


Vy,- = 


V = 


Xi - Xi 

- v r 2 

Xi - x 2 


, V 2 Vj - V 2 y 3 
V 3 y ; - = ^ 

Xi - x 3 


i = 2, 3,..., n 
, i = 3, 4,..., n 
i = 4, 5,... n 


(3.5) 


v"y„ 


y"- 1 ^, _ y»-ij^_i 

•^77 •^• 72—1 


the solution of Eqs. (a) is 


fli=yi = Vy 2 fl3 = V 2 y 3 ••• a n = V"y„ (3.6) 

If the coefficients are computed by hand, it is convenient to work with the format in 
Table 3.1 (shown for n = 5). 


Xi 

y i 





x 2 

y 2 

vy 2 




X 3 

y 3 

vy 3 

v 2 y 3 



x 4 

yr 

Vy 4 

v 2 y 4 

v 3 y 4 


X 5 

y 5 

vy 5 

v 2 y 5 

v 3 y 5 

v 4 y 5 


Table 3.1 


The diagonal terms (yi, Vy 2 , V 2 y 3 , V 3 y 4 and V 4 y 5 ) in the table are the coefficients 
of the polynomial. If the data points are listed in a different order, the entries in the table 
will change, but the resultant polynomial will be the same—recall that a polynomial 
of degree n — 1 interpolating n distinct data points is unique. 


■ newtonCoeff 

Machine computations are best carried out within a one-dimensional array a employ¬ 
ing the following algorithm: 

function a = newtonCoeff(xData,yData) 

% Returns coefficients of Newton's polynomial. 
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% USAGE: a = newtonCoeff(xData,yData) 

% xData = x-coordinates of data points. 

% yData = y-coordinates of data points. 

n = length(xData); 
a = yData; 
for k = 2:n 

a(k:n) = (a(k:n) - a(k-l))./(xData(k:n) - xData(k-l)); 

end 


Initially, a contains the y-values of the data, so that it is identical to the second 
column in Table 3.1. Each pass through the for-loop generates the entries in the next 
column, which overwrite the corresponding elements of a. Therefore, a ends up con¬ 
taining the diagonal terms of Table 3.1; i.e., the coefficients of the polynomial. 

Neville's Method 

Newton’s method of interpolation involves two steps: computation of the coefficients, 
followed by evaluation of the polynomial. This works well if the interpolation is carried 
out repeatedly at different values of x using the same polynomial. If only one point is 
to be interpolated, a method that computes the interpolant in a single step, such as 
Neville’s algorithm, is a better choice. 

Let PkiXi, jCf+i,..., X; + fc] denote the polynomial of degree k that passes through 
the k+ 1 data points (x,\ y,), (x,-+ i, y,+ 1 ) ...., {x i+k . y i+k ] . For a single data point, we 
have 


PolXi\ = yt 


(3.7) 


The interpolant based on two data points is 


Pl[Xi,X i+1 ] 


[X - X i+1 )P 0 [Xi\ + [Xj - X)P 0 [X i+ i] 
Xi - X i+ 1 


It is easily verified that P\ [x,-, x,- + i | passes through the two data points; that is, 
Pi lx,, Xj+i] = y t when x = x t , and Pi [x,-, x,- +1 ] = y ,- +1 when x = x,- +1 . 

The three-point interpolant is 


Pilxu x;+i, x i+2 ] 


(X - X, +2 )Pi [Xj , Xj +1 ] + (X,- - X)Pi [X;+i, x i+2 \ 
Xi - X i+2 


To show that this interpolant does intersect the data points, we first substitute x = x,-, 
obtaining 


P 2 \Xi, x i+1 ,x i+2 ] = PilXi, x i+] ] = y t 
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Similarly, x = x, +2 yields 

P 2 te, X;+1, X/ +2 ] = Pi[Xj + i, X, +2 ] = y i+ 2 
Finally, when x = x i+ i we have 


Pi Ur, Xi+i] = Pi[x i+ i,x i+2 ] = y i+ 1 


so that 


P 2 [Xj,X I+ i,Xi +2 ] 


Ui+l 


x i+2 )yi+\ + Ui 

X; - X ,- +2 


x i+ i)yi+i 
-= yt +1 


Flaving established the pattern, we can now deduce the general recursive 
formula: 


Ptlxt, Xj+i,..., Xi-j-kl (3.8) 

_ (X - Xi +k )Pk-l[XiX i+ u X i+k _i ] + ( Xj - X)Pfc_! [X,- + 1 x i+2 , ■ .., x i+k ] 

Xi Xi+k 

Given the value of x, the computations can be carried out in the following tabular 
format (shown for four data points): 


k=0 k = 1 k = 2 k = 3 


Xi 

Poixi] = y 1 

Pl[Xi,X 2 ] 

P 2 [Xi,X 2 ,X 3 ] 

P 3 [Xi,X 2 ,X 3 ,X 4 ] 

X 2 

Pq[x 2 ] = yi 

Pi[x 2 ,x 3 ] 

P 2 [x 2 x 3 ,x 4 ] 


X 3 

-Pofe] = y 3 

Pi[x 3 ,x 4 ] 



x 4 

P 0 [x 4 ] = y 4 





Table 3.2 


■ neville 

This algorithm works with the one-dimensional array y, which initially contains the 
y-values of the data (the second column in Table 3.2). Each pass through the for- 
loop computes the terms in next column of the table, which overwrite the previous 
elements of y. At the end of the procedure, y contains the diagonal terms of the table. 
The value of the interpolant (evaluated at x) that passes through all the data points is 
y\, the first element of y. 

function ylnterp = neville(xData,yData,x) 

% Neville’s polynomial interpolation; 

% returns the value of the interpolant at x. 
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% USAGE: ylnterp = neville(xData,yData,x) 

% xData = x-coordinates of data points. 

% yData = y-coordinates of data points. 

n = length(xData); 
y = yData; 
for k = l:n-l 

y(l:n-k) = ((x - xData(k+l:n)).*y(1:n-k)... 

+ (xData(1:n-k) - x).*y(2:n-k+1))... 
./(xData(1:n-k) - xData(k+l:n)); 

end 

ylnterp = y(l) ; 


Limitations of Polynomial Interpolation 

Polynomial interpolation should be carried out with the fewest feasible number of 
data points. Linear interpolation, using the nearest two points, is often sufficient if 
the data points are closely spaced. Three to six nearest-neighbor points produce good 
results in most cases. An interpolant intersecting more than six points must be viewed 
with suspicion. The reason is that the data points that are far from the point of interest 
do not contribute to the accuracy of the interpolant. In fact, they can be detrimental. 

The danger of using too many points is illustrated in Fig. 3.3. There are 11 equally 
spaced data points represented by the circles. The solid line is the interpolant, a poly¬ 
nomial of degree ten, that intersects all the points. As seen in the figure, a polynomial 
of such a high degree has a tendency to oscillate excessively between the data points. 
A much smoother result would be obtained by using a cubic interpolant spanning 
four nearest-neighbor points. 



Figure 3.3. Polynomial interpolant displaying oscillations. 















Ill 


3.2 Polynomial Interpolation 


Polynomial extrapolation (interpolating outside the range of data points) is dan¬ 
gerous. As an example, consider Fig. 3.4. There are six data points, shown as circles. The 
fifth-degree interpolating polynomial is represented by the solid line. The interpolant 
looks fine within the range of data points, but drastically departs from the obvious 
trendwhenx > 12.Extrapolatingyatx = 14, for example, would be absurd in this case. 



Figure 3.4. Extrapolation may not follow the trend of data. 


If extrapolation cannot be avoided, the following two measures can be useful: 

• Plot the data and visually verify that the extrapolated value makes sense. 

• Use a low-order polynomial based on nearest-neighbor data points. A linear or 
quadratic interpolant, for example, would yield a reasonable estimate of y (14) for 
the data in Fig. 3.4. 

• Work with a plot of log x vs. log y, which is usually much smoother than the x-y 
curve, and thus safer to extrapolate. Frequently this plot is almost a straight line. 
This is illustrated in Fig. 3.5, which represents the logarithmic plot of the data in 
Fig. 3.4. 



Figure 3.5. Logarithmic plot of the data in Fig. 3.4. 
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EXAMPLE 3.1 

Given the data points 


X 

0 

2 

3 

y 

7 

11 

28 


use Lagrange’s method to determine y at x = 1. 

Solution 

= (X-X 2 Kx-Xs) = (1 — 2)(1 — 3) = 1 

1 (X! - x 2 X*i - x 3 ) (0 — 2) (0 — 3) 3 

g (x Xi) (x x 3 ) (1 — 0) (1 — 3) 

2 (x 2 -x!)(x 2 -x 3 ) (2 — 0) (2 — 3) 

£ (x-XiKx-Xg) _ (1 — 0)(1 — 2) = 1 

3 (x 3 -xi)(x 3 -x 2 ) (3-0)0-2) 3 


7 28 

y = yiti + y 2 l 2 + y 3 £ 3 = - + ll-y = 4 


EXAMPLE 3.2 

The data points 


X 

-2 

1 

4 

-1 

3 

-4 

y 

-1 

2 

59 

4 

24 

-53 


lie on a polynomial. Determine the degree of this polynomial by constructing the 
divided difference table, similar to Table 3.1. 


i 

Xi 

y; 

Vy; 

V 2 y; 

v 3 y,- 

V 4 y; 

v 5 y,- 

i 

-2 

-1 






2 

1 

2 

1 





3 

4 

59 

10 

3 




4 

-1 

4 

5 

-2 

l 



5 

3 

24 

5 

2 

l 

0 


6 

-4 

-53 

26 

-5 

l 

0 

0 


Solution 
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3.2 Polynomial Interpolation 


Here are a few sample calculations used in arriving at the figures in the table: 

ya-yi = 59-t-l) = 1Q 
x 3 - Xi 4- (-2) 

vys - vyz = io- i = 

x 3 - x 2 4-1 

v 2 y 6 - v 2 y 3 _ -5-3 _ 1 

x 6 -x 3 -4-4 

From the table we see that the last nonzero coefficient (last nonzero diagonal term) 
of Newton’s polynomial is V 3 j/ 3 , which is the coefficient of the cubic term. Hence the 
polynomial is a cubic. 

EXAMPLE 3.3 

Given the data points 


vy 3 

v 2 y 3 

v 3 y 6 


X 

4.0 

3.9 

3.8 

3.7 

y 

-0.06604 

-0.02724 

0.01282 

0.05383 


determine the root of y{x) = 0 by Neville’s method. 

Solution This is an example of inverse interpolation, where the roles of x and y are 
interchanged. Instead of computing y at a given x, we are finding x that corresponds 
to a given y (in this case, y = 0). Employing the format of Table 3.2 (with x and y 
interchanged, of course), we obtain 


i 

yi 

PoU = ^ 

PiL] 

p 2 [,a 

hi, ,,] 

i 

-0.06604 

4.0 

3.8298 

3.8316 

3.8317 

2 

-0.02724 

3.9 

3.8320 

3.8318 


3 

0.01282 

3.8 

3.8313 



4 

0.05383 

3.7 





The following are a couple of sample computations used in the table: 

(y - yzl^olyil + (yi - ylPo[y 2 ] 


P\[yi,yi\ 


^2[y2,y3,y4] = 


= 3.8298 


yi-y 2 

(0 + 0.02724) (4.0) + (-0.06604 - 0) (3.9) 
-0.06604 + 0.02724 
(y - y 4 )Pi [y 2 , yal + (y 2 - y)Pi lys, yd 
y 2 -y 4 

(0 - 0.05383) (3.8320) + (-0.02724 - 0) (3.8313) 


= 3.8318 


-0.02724 - 0.05383 
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All the P’s in the table are estimates of the root resulting from different orders 
of interpolation involving different data points. For example, P\[y\, 3 / 2 ] is the root 
obtained from linear interpolation based on the first two points, and P 2 [y 2 , 73 , y.\ ] is 
the result from quadratic interpolation using the last three points. The root obtained 
from cubic interpolation over all four data points is x = P 3 [y 1 . y 2 , y 3 , y 4 ] = 3.8317. 

EXAMPLE 3.4 

The data points in the table lie on the plot of f[x) = 4.8 cos nx/20. Interpolate this 
data by Newton’s method at x = 0, 0.5,1.0,..., 8.0 and compare the results with the 
“exact” values given by y = f{x). 


X 

0.15 

2.30 

3.15 

4.85 

6.25 

7.95 

y 

4.79867 

4.49013 

4.2243 

3.47313 

2.66674 

1.51909 


Solution 

% Example 3.4 (Newton’s interpolation) 
xData = [0.15; 2.3; 3.15; 4.85; 6.25; 7.95]; 
yData = [4.79867; 4.49013; 4.22430; 3.47313;... 

2.66674; 1.51909]; 
a = newtonCoeff(xData,yData); 

x ylnterp yExact’ 

for x = 0: 0.5: 8 

y = newtonPolyCa,xData,x); 
yExact = 4.8*cos(pi*x/20); 
fprintf(’%10.5f’,x,y,yExact) 
fprintf(’\n’) 

end 

The results are: 

ans = 


x 

ylnterp 

yExact 

0.00000 

4.80003 

4.80000 

0.50000 

4.78518 

4.78520 

1.00000 

4.74088 

4.74090 

1.50000 

4.66736 

4.66738 

2.00000 

4.56507 

4.56507 

2.50000 

4.43462 

4.43462 

3.00000 

4.27683 

4.27683 

3.50000 

4.09267 

4.09267 













Interpolation with Cubic 

Spline 

4.00000 

3.88327 

3.88328 

4.50000 

3.64994 

3.64995 

5.00000 

3.39411 

3.39411 

5.50000 

3.11735 

3.11735 

6.00000 

2.82137 

2.82137 

6.50000 

2.50799 

2.50799 

7.00000 

2.17915 

2.17915 

7.50000 

1.83687 

1.83688 

8.00000 

1.48329 

1.48328 


3.3 Interpolation with Cubic Spline 

If there are more than a few data points, a cubic spline is hard to beat as a global 
interpolant. It is considerably “stiffer” than a polynomial in the sense that it has less 
tendency to oscillate between data points. 


Elastic strip 



Pins (data points) 
x 


Figure 3.6. Mechanical model of natural cubic spline. 


The mechanical model of a cubic spline is shown in Fig. 3.6. It is a thin, elastic strip 
that is attached with pins to the data points. Because the strip is unloaded between the 
pins, each segment of the spline curve is a cubic polynomial—recall from beam the¬ 
ory that the differential equation for the displacement of a beam is d 4 y/dx 4 = q/{EI), 
so that y(x ) is a cubic since the load q vanishes. At the pins, the slope and bending 
moment (and hence the second derivative) are continuous. There is no bending mo¬ 
ment at the two end pins; hence the second derivative of the spline is zero at the end 
points. Since these end conditions occur naturally in the beam model, the resulting 
curve is known as the natural cubic spline. The pins, i.e., the data points, are called 
the knots of the spline. 



Figure 3.7 shows a cubic spline that spans n knots. We use the notation fj + \ (x) 
for the cubic polynomial that spans the segment between knots i and i + 1. Note 
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that the spline is a piecewise cubic curve, put together from the n— 1 cubics 
/i 2 (x), / 2 , 3 (x),..., fi- 1,1 iM, all of which have different coefficients. 

If we denote the second derivative of the spline at knot i by ki , continuity of second 
derivatives requires that 


fi-U&t ) = fu+ 1(*/) = k i 
At this stage, each k is unknown, except for 

/ci = k n = 0 


(a) 


(3.9) 


The starting point for computing the coefficients of fj + 1 (x) is the expression for 
f" i+ 1 (x), which we know to be linear. Using Lagrange’s two-point interpolation, we 
can write 


fij+iix) = kiliix) + ki+ili+Ax] 


where 


Therefore, 


Idx) 


X - X i+1 
Xi - X i+ 1 


li+ 1 W 


X-Xj 
x i+ i - Xi 


fu +iM = 


fciU ~ -U+i) ~ k i+1 [x - x^ 
Xi - X i+ i 


(b) 


Integrating twice with respect to x, we obtain 

„ ki (x — Xi+ 1) 3 — kj+i (x — xA 3 

f,i+ l(x) = -—---h A(X - x i+ 1 ) - B[x - Xi) (c) 

6 (Xj - x i+ 1) 

where A and B are constants of integration. The last two terms in Eq. (c) would usually 
be written as Cx + D. By letting C = A - B and D = — Ax,- + i + /ix,, we end up with 
the terms in Eq. (c), which are more convenient to use in the computations that 
follow. 

Imposing the condition fij+i (x,) = y,-, we get from Eq. (c) 


kj{Xi - X/ + i ) 3 
6 (X; - X i+ i) 


+ A(Xj - x i+ i) = y, 


Therefore, 


A = 


yt 


Xi - Xi+1 6 

Similarly, fij+\{x i+] ) = y i+] yields 


- — {Xi -x i+ i) 


Vi +1 ki+i 

B = - -^{xt - x i+ 1 ) 

Xi - Xi+1 6 


(d) 


(e) 












3.3 Interpolation with Cubic Spline 


Substituting Eqs. (d) and (e) into Eq. (c) results in 



(x- Xj+iKx,- - x i+1 ) 


fcj+i [ (x - x,-) 3 
6 Xi~ x !+ i 


- (X-Xi)(X/ -x i+ i) 


(3.10) 


yAx-Xj+i) -y i+ i{x-Xi) 


Xi - X i+ 1 


The second derivatives fc, of the spline at the interior knots are obtained from 
the slope continuity conditions f-_ l i (x i ) = f!. +1 (x,0, where i = 2, 3,..., n - 1. After a 
little algebra, this results in the simultaneous equations 


fc;_l(Xj —1 - Xi) + 2k t {Xi-i - x I+ i) + fc i+ i(Xj - x i+ i) 


= 6^——^ - ——»«), i = 2, 3, 1 (3.11) 

\ Xi—1 - X( Xi-Xi+i) 

Because Eqs. (3.11) have a tridiagonal coefficient matrix, they can be solved econom¬ 
ically with functions LUdec3 and LUsol3 described in Art. 2.4. 

If the data points are evenly spaced at intervals/t, then Xj_i — x, = x,- — x,+i = —h, 
and the Eqs. (3.11) simplify to 



■ splineCurv 

The first stage of cubic spline interpolation is to set up Eqs. (3.11) and solve them 
for the unknown k’s (recall that k\ = k„ = 0). This task is carried out by the function 
splineCurv: 

function k = splineCurv(xData,yData) 

% Returns curvatures of a cubic spline at the knots. 

% USAGE: k = splineCurv(xData,yData) 

% xData = x-coordinates of data points. 

% yData = y-coordinates of data points. 

n = length(xData); 
c = zeros(n-l,1); d = ones(n,l); 
e = zeros(n-l,1); k = zeros(n,l); 
c(l:n-2) = xData(l:n-2) - xData(2:n-1); 
d(2:n-l) = 2*(xData(1:n-2) - xData(3:n)); 
e(2:n-l) = xData(2:n-l) - xData(3:n); 
k(2:n-l) = 6*(yData(1:n-2) - yData(2:n-1))... 


./(xData(1:n-2) - xData(2:n-1))... 
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- 6*(yData(2:n-l) - yDataC3:n)). . . 

./(xData(2:n-1) - xData(3:n)); 

[c,d,e] = LUdec3(c,d,e); 
k = LUsol3(c,d,e,k); 

■ splineEval 

The function splineEval computes the interpolant at x from Eq. (3.10). The sub¬ 
function finds eg finds the segment of the spline that contains x by the method 
of bisection. It returns the segment number; that is, the value of the subscript i in 
Eq. (3.10). 

function y = splineEval(xData,yData,k,x) 

% Returns value of cubic spline interpolant at x. 

% USAGE: y = splineEval(xData,yData,k,x) 

% xData = x-coordinates of data points. 

% yData = y-coordinates of data points. 

% k = curvatures of spline at the knots; 

% returned by function splineCurv. 

i = findSeg(xData,x); 
h = xData(i) - xData(i+l); 

y = ((x - xData(i+l) ) " 3/h - (x - xData(i-t-l) ) *h) *k(i)/6.0 . . . 

- ((x - xData(i))"3/h - (x - xData(i))*h)*k(i+l)/6.0... 

+ yData(i)*(x - xData(i+l))/h... 

- yData(i+l)*(x - xData(i))/h; 

function i = findSeg(xData,x) 

% Returns index of segment containing x. 
iLeft = 1; iRight = length(xData); 
while 1 

if(iRight - iLeft) <= 1 
i = iLeft; return 

end 

i = fix((iLeft + iRight)/2); 
if x < xData(i) 
iRight = i; 

else 

iLeft = i; 

end 


end 
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EXAMPLE 3.5 

Use natural cubic spline to determine y at x = 1.5. The data points are 


X 

i 

2 

3 

4 

5 

y 

0 

1 

0 

1 

0 


Solution The five knots are equally spaced at h = 1. Recalling that the second deriva¬ 
tive of a natural spline is zero at the first and last knot, we have k i = k 5 = 0. The second 
derivatives at the other knots are obtained from Eq. (3.12). Using i = 2, 3, 4 we get the 
simultaneous equations 


0 + 4fc 2 + k 3 = 6 [0 - 2(1) + 0] = -12 
k 2 + 4fc 3 + k 4 = 6 [1 - 2(0) + 1] = 12 
fc 3 + 4fc 4 + 0 = 6 [0 - 2(1) + 0] = -12 

The solution is k 2 = k A = —30/7, fc 3 = 36/7. 

The point x = 1.5 lies in the segment between knots 1 and 2. The corresponding 
interpolant is obtained from Eq. (3.10) by setting i = 1. With x, — x,-+i = — h = — 1, we 
obtain 

/uW = -y [(X- X 2 ) 3 - (X — X 2 )] + ^ [(*- *l) 3 - (X- Xi)] 

- [yi(x - x 2 ) - y 2 (x - Xi)] 


Therefore, 

y(1.5) = /r,2(1.5) = 0 + l (-y) [(1-5 - l) 3 - d-5 - U] - [0 - 1(1.5 - 1)] = 0.7679 

The plot of the interpolant, which in this case is made up of four cubic segments, is 
shown in the figure. 



X 
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EXAMPLE 3.6 

Sometimes it is preferable to replace one or both of the end conditions of the cu¬ 
bic spline with something other than the natural conditions. Use the end condition 
/i' 2 (0) = 0 (zero slope), rather than f" 2 (0) = 0 (zero curvature), to determine the cubic 
spline interpolant at x = 2.6 based on the data points 


X 

0 

1 

2 

3 

y 

1 

1 

0.5 

0 


Solution We must first modify Eqs. (3.12) to account for the new end condition. Setting 
i = 1 in Eq. (3.10) and differentiating, we get 


/l,2 M = y 


Ax- x 2 ) 2 , ' 

3-Ui - x 2 ) 


X\ — X2 

Thus the end condition f{ 2 (x 1 ) = 0 yields 


~6 


Ax- xi) 2 

3-(Xi - x 2 ) 

Xi - x 2 


+ 


yi-y2 

X\ - x 2 


k\ k 2 Vi — v 2 

^(Xi - x 2 ) + (Xi - x 2 ) + A—11 = o 
3 6 Xi - x 2 


or 


2 k\ T- k 2 = —6 


yi-y 2 


(Xi - x 2 ) 2 

From the given data we see that y 4 = y 2 = 1, so that the last equation becomes 


2ki + k 2 = 0 (a) 

The other equations in Eq. (3.12) are unchanged. Noting that A; 4 = 0 and h = 1, we 
have 


k\ + 4k 2 + fc 3 = 6 [1 — 2(1) + 0.5] = —3 (b) 

k 2 + 4fc 3 = 6 [1 — 2(0.5) + 0] = 0 (c) 

The solution of Eqs. (a)-(c) is k\ = 0.4615, k 2 = —0.9231, k 3 = 0.2308. 

The interpolant can now be evaluated from Eq. (3.10). Substituting i = 3 and 
Xi — x,-+i = —1, we obtain 

/3.4W = Y [-U - X4) 3 + (x - x 4) ] - Y [-U - X 3 ) 3 + (x - x 3 )] 

—y 3 (x — x 4 ) + y 4 (x — x 3 ) 


Therefore, 


0.2308 


[—(—0.4) 3 + (-0.4)] + 0 - 0.5(—0.4) + 0 = 0.1871 


y(2.6) = / 3 , 4 (2.6) 


6 
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EXAMPLE 3.7 

Write a program that interpolates between given data points with the natural cubic 
spline. The program must be able to evaluate the interpolant for more than one value 
of x. As a test, use data points specified in Example 3.4 and compute the interpolant 
at x = 1.5 and x = 4.5 (due to symmetry, these values should be equal). 

Solution The program below prompts for x; it is terminated by pressing the “return” 
key. 

% Example 3.7 (Cubic spline) 
xData = [1; 2; 3; 4; 5]; 
yData = [0; 1; 0; 1; 0]; 
k = splineCurv(xData,yData); 
while 1 

x = input('x = ’); 
if isempty(x) 

fprintf('Done’); break 

end 

y = splineEval(xData,yData,k,x) 
fprintf(’\n’) 

end 


Running the program produces the following results: 
x = 1.5 

y = 

0.7679 

x = 4.5 

y = 

0.7679 


x = 

Done 

PROBLEM SET 3.1 

1. Given the data points 


X 

-1.2 

0.3 

1.1 

y 

-5.76 

-5.61 

-3.69 


determine y at jc = 0 using (a) Neville’s method and (b) Lagrange’s method. 
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2. Find the zero of y(x) from the following data: 


X 

0 

0.5 

1 

1.5 

2 

2.5 

3 

y 

1.8421 

2.4694 

2.4921 

1.9047 

0.8509 

-0.4112 

-1.5727 


Use Lagrange’s interpolation over (a) three; and (b) four nearest-neighbor data 
points. Hint: after finishing part (a), part (b) can be computed with a relatively 
small effort. 

3. The function y{x) represented by the data in Prob. 2 has a maximum at x = 0.7679. 
Compute this maximum by Neville’s interpolation over four nearest-neighbor 
data points. 

4. Use Neville’s method to compute y at x = ji/4 from the data points 


X 

0 

0.5 

1 

1.5 

2 

y 

-1.00 

1.75 

4.00 

5.75 

7.00 


5. Given the data 


X 

0 

0.5 

1 

1.5 

2 

y 

-0.7854 

0.6529 

1.7390 

2.2071 

1.9425 


find y at x = n /4 and at jr/2. Use the method that you consider to be most con¬ 
venient. 

6. The points 


X 

-2 

1 

4 

-1 

3 

-4 

y 

-1 

2 

59 

4 

24 

-53 


lie on a polynomial. Use the divided difference table of Newton’s method to de¬ 
termine the degree of the polynomial. 

7. Use Newton’s method to find the expression for the lowest-order polynomial that 
fits the following points: 


X 

-3 

2 

-1 

3 

1 

y 

0 

5 

-4 

12 

0 
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8. Use Neville’s method to determine the equation of the quadratic that passes 
through the points 


X 

-i 

1 

3 

y 

17 

-7 

-15 


9. The density of air p varies with elevation h in the following manner: 


h (km) 

0 

3 

6 

p (kg/m 3 ) 

1.225 

0.905 

0.652 


Express p[li) as a quadratic function using Lagrange’s method. 

10. Determine the natural cubic spline that passes through the data points 


X 

0 

i 

2 

y 

0 

2 

1 


Note that the interpolant consists of two cubics, one valid in 0 < x < 1, the other 
in 1 < x < 2. Verify that these cubics have the same first and second derivatives 
atx = 1. 

11. Given the data points 


X 

i 

2 

3 

4 

5 

y 

13 

15 

12 

9 

13 


determine the natural cubic spline interpolant at x = 3.4. 

12. Compute the zero of the function y[x) from the following data: 


X 

0.2 

0.4 

0.6 

0.8 

1.0 

y 

1.150 

0.855 

0.377 

-0.266 

-1.049 


Use inverse interpolation with the natural cubic spline. Hint: reorder the data so 
that the values of y are in ascending order. 

13. Solve Example 3.6 with a cubic spline that has constant second derivatives within 
its first and last segments (the end segments are parabolic). The end conditions 
for this spline are k\ = k 2 and fc„_i = k„. 






































124 


Interpolation and Curve Fitting 


14. ■ Write a computer program for interpolation by Neville’s method. The program 
must be able to compute the interpolant at several user-specified values of x. Test 
the program by determining y at jc = 1.1,1.2 and 1.3 from the following data: 


X 

-2.0 

-0.1 

-1.5 

0.5 

y 

2.2796 

1.0025 

1.6467 

1.0635 

X 

-0.6 

2.2 

1.0 

1.8 

y 

1.0920 

2.6291 

1.2661 

1.9896 


(Answer: y = 1.3262, 1.3938, 1.4693) 

15. ■ The specific heat c p of aluminum depends on temperature T as follows: 4 


T (°C) 

-250 

-200 

-100 

0 

100 

300 

c p (kJ/kg-K) 

0.0163 

0.318 

0.699 

0.870 

0.941 

1.04 


Determine c p atT = 200°C and 400°C. 

16. ■ Find y at x = 0.46 from the data 


X 

0 

0.0204 

0.1055 

0.241 

0.582 

0.712 

0.981 

y 

0.385 

1.04 

1.79 

2.63 

4.39 

4.99 

5.27 


17. ■ The table shows the drag coefficient c D of a sphere as a function of Reynolds 
number Re. 5 Use natural cubic spline to find Cd at Re = 5, 50, 500 and 5000. Hint: 
use log-log scale. 


Re 

0.2 

2 

20 

200 

2000 

20 000 

Cd 

103 

13.9 

2.72 

0.800 

0.401 

0.433 


18. ■ Solve Prob. 17 using a polynomial interpolant intersecting four nearest- 
neighbor data points. 

19. ■ The kinematic viscosity /r k of water varies with temperature T in the following 
manner: 


4 Source: Black, Z.B., and Hartley, J.G., Thermodynamics, Harper & Row, 1985. 

5 Source: Kreith, F., Principles of Heat Transfer, Harper & Row, 1973. 
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T (°C) 

0 

21.1 

37.8 

54.4 

71.1 

87.8 

100 

/x k (10“ 3 m 2 /s) 

1.79 

1.13 

0.696 

0.519 

0.338 

0.321 

0.296 


Interpolate /x k at T = 10°, 30°, 60° and 90°C. 

20. ■ The table shows how the relative density p of air varies with altitude h. Deter¬ 
mine the relative density of air at 10.5 km. 


h (km) 

0 

1.525 

3.050 

4.575 

6.10 

7.625 

9.150 

P 

1 

0.8617 

0.7385 

0.6292 

0.5328 

0.4481 

0.3741 


3.4 Least-Squares Fit 

Overview 

If the data are obtained from experiments, they typically contain a significant amount 
of random noise due to measurement errors. The task of curve fitting is to find a 
smooth curve that fits the data points “on the average.” This curve should have a 
simple form (e.g. a low-order polynomial), so as to not reproduce the noise. 

Let 

fix) = /(xjfli, a 2 ,.. ., a m ) 

be the function that is to be fitted to the n data points (x;, yd, i = 1, 2. n. The 

notation implies that we have a function of x that contains the parameters aj, 
j = 1,2,..., m, where m< n. The form of /(x) is determined beforehand, usually 
from the theory associated with the experiment from which the data is obtained. The 
only means of adjusting the fit is the parameters. For example, if the data represent the 
displacements y, of an overdamped mass-spring system at time Lj , the theory suggests 
the choice fit ) = a\te~ ait . Thus curve fitting consists of two steps: choosing the form 
of f[x), followed by computation of the parameters that produce the best fit to the 
data. 

This brings us to the question: what is meant by “best” fit? If the noise is confined 
to the y-coordinate, the most commonly used measure is the least-squares fit, which 
minimizes the function 

n 

Sia u a 2 , ...,aj = ^2 [y* - /(x,)] 2 (3.13) 

1 = 1 

with respect to each aj. Therefore, the optimal values of the parameters are given by 
the solution of the equations 


— = 0, k=l,2,...,m 

dajc 


(3.14) 
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The terms r* = y t — /(x,) in Eq. (3.13) are called residuals; they represent the dis¬ 
crepancy between the data points and the fitting function at x*. The function S to be 
minimized is thus the sum of the squares of the residuals. Equations (3.14) are gener¬ 
ally nonlinear in aj and may thus be difficult to solve. If the fitting function is chosen 
as a linear combination of specified functions fjix): 


fix) = a i f (x) + a 2 f 2 (x) H-h a m f m (x) 


thenEqs. (3.14) are linear. Atypical example isapolynomial where/i(x) = 1, / 2 (x) = x, 
/ 3 (x) = x 2 , etc. 

The spread of the data about the fitting curve is quantified by the standard devi¬ 
ation, defined as 



(3.15) 


Note that i f n = m, we have interpolation, not curve fitting, in that case, both the 
numerator and the denominator in Eq. (3.15) are zero, so that o is meaningless, as it 
should be. 


Fitting a Straight Line 

Fitting a straight line 


fix) = a + bx 


(3.16) 


to data is also known as linear regression. In this case the function to be minimized is 


n 


Sia, b) = ^ O'/ - a - bxj) 2 


i =1 


Equations (3.14) now become 



Dividing both equations by 2 n and rearranging terms, we get 



where 



(3.17) 
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are the mean values of the x and y data. The solution for the parameters is 

b = E - n5c y 

x? — «x 2 x 2 — nx 2 


(3.18) 


These expressions are susceptible to roundoff errors (the two terms in each numerator 
as well as in each denominator can be roughly equal). It is better to compute the 
parameters from 


b _ Ey.-(*.--*) 

£X;(X; - X) 


a = y — xb 


which are equivalent to Eqs. (3.18), but much less affected by rounding off. 


(3.19) 


Fitting Linear Forms 

Consider the least-squares fit of the linear form 

m 

fix) = a] fix) + a 2 / 2 (x) 4-b a m f m ix) = ^ ajfjix ) 

i=i 


(3.20) 


where each fj (x) is a predetermined function of x, called a basis function. Substitution 
into Eq. (3.13) yields 

„ “12 


s=E 


yi - J^ a if]iXi 
J =i 


(a) 


Thus Eqs. (3.14) are 


— = - 2 |V 


i=l 


y>- Y, a ifj (Xi) 

7=1 


fkixf } = 0, fc= 1,2,.... m 


Dropping the constant (—2) and interchanging the order of summation, we get 


E 

i=i 


E N x AM x i) 


i =1 


7; = E M x i)yt’ k = 1 . 2 ,..., 




i=l 


In matrix notation these equations are 


Aa = b 


(3.21a) 


where 

72 72 

Ay = E fi ^ f^ Xi) hk = E (3.21b) 

!=1 i=l 

Equations (3.21a), known as the normal equations of the least-squares fit, can be 
solved with any of the methods discussed in Chapter 2. Note that the coefficient 
matrix is symmetric, i.e., Ay = Ajk. 
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Polynomial Fit 


A commonly used linear form is a polynomial. If the degree of the polynomial is m — 1, 
we have fix) = Ejli Here the basis functions are 

fj{x) = xi-\ j = 1,2, _ m (3.22) 


so thatEqs. (3.21b) become 


Aj = ± x t k2 b k = Y j x k i l y i 

i =1 i =1 



n 

E^' 

E*? ■ 

.. E*r 


’Ey; 

A = 

T, x i 

E*? 

E*? ■ 

.. E*r x 

b = 

E x iyi 


.E^r 1 

E*r 

E^r 1 ■ 

.. E*f"“ 2 . 


i 

. w 

_ i 


where E stands for E" = i. The normal equations become progressively ill-conditioned 
with increasing m Fortunately, this is of little practical consequence, because only 
low-order polynomials are useful in curve fitting. Polynomials of high order are not 
recommended, because they tend to reproduce the noise inherent in the data. 


■ polynFlt 

The function polynFit computes the coefficients of a polynomial of degree m — 1 to 
fit n data points in the least-squares sense. To facilitate computations, the terms n, 
J2 x i> E > ■ ■ ■. E x ^ m ~ 2 that make up the coefficient matrix A in Eq. (3.23) are first 
stored in the vector s and then inserted into A. The normal equations are solved for 
the coefficient vector coef f by Gauss elimination with pivoting. Since the elements 
of coef f emerging from the solution are not arranged in the usual order (the coeffi¬ 
cient of the highest power of x first), the coeff array is “flipped” upside-down before 
returning to the calling program. 

function coeff = polynFit(xData,yData,m) 

% Returns the coefficients of the polynomial 
% a(l)*x~(m-1) + a(2) *x~ (m-2) + ... + a(m) 

% that fits the data points in the least squares sense. 

% USAGE: coeff = polynFit(xData,yData,m) 

% xData = x-coordinates of data points. 

% yData = y-coordinates of data points. 
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A = zeros(m); b = zeros(m,l); s = zeros(2*m-l,1); 
for i = 1:length(xData) 
temp = yData(i); 
for j = 1:m 

b(j) = b(j) + temp; 
temp = temp*xData(i); 

end 

temp = 1; 

for j = l:2*m-l 

s(j) = s(j) + temp; 
temp = temp*xData(i); 

end 

end 

for i = l:m 

for j = 1:m 

A(i,j) = s(i+j-1); 

end 

end 

% Rearrange coefficients so that coefficient 

% of x~(m-l) is first 

coeff = flipdim(gaussPiv(A,b),1); 

■ stdDev 

After the coefficients of the fitting poiynomial have been obtained, the standard de¬ 
viation o can be computed with the function stdDev. The poiynomial evaluation in 
stdDev is carried out by the subfunction polyEval which is described in Art. 4.7—see 
Eq. (4.10). 

function sigma = stdDev(coeff,xData,yData) 

% Returns the standard deviation between data 
% points and the polynomial 
% a(l)*x"(m-1) + a(2)*x"(m-2) + ... + a(m) 

% USAGE: sigma = stdDev(coeff,xData,yData) 

% coeff = coefficients of the polynomial. 

% xData = x-coordinates of data points. 

% yData = y-coordinates of data points. 


m = lengthfcoeff); n = length(xData); 





Interpolation and Curve Fitting 


sigma = 0; 
for i =l:n 

y = polyEval(coeff,xData(i)) ; 
sigma = sigma + (yData(i) - y)"2; 

end 

sigma =sqrt(sigma/(n - m)); 

function y = polyEval(coeff,x) 

% Returns the value of the polynomial at x. 
m = length(coeff); 
y = coeff(l); 
for j = l:m-l 

y = y*x + coeff(j+l); 

end 

Weighting of Data 

There are occasions when confidence in the accuracy of data varies from point to point. 
For example, the instrument taking the measurements may be more sensitive in a 
certain range of data. Sometimes the data represent the results of several experiments, 
each carried out under different circumstances. Under these conditions we may want 
to assign a confidence factor, or weight, to each data point and minimize the sum of 
the squares of the weighted residuals r,- = W, [y ; — /(x,-)], where W, are the weights. 
Hence the function to be minimized is 


n 



(3.24) 


This procedure forces the fitting function fix ) closer to the data points that have 
higher weights. 

Weighted linear regression 

If the fitting function is the straight line fix) = a + bx, Eq. (3.24) becomes 


n 



(3.25) 


;=i 


The conditions for minimizing S are 
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or 


dS 
3 b 


n 

-2 ^2 Wfiy t - a- bXi)Xi = 0 
1=1 


n n n 

a J2 W i + b YI W i Xi = Y W ?y‘ (3.26a) 

i =1 i =1 i =1 


aJ2 W?Xi + bJ2w?xt = J2 Kw 

i—1 i =1 i=l 

Dividing Eq. (3.26a) by W? and introducing the weighted averages 

t= E w?x t zwfyt 
E^ 2 y E^ 2 

we obtain 


(3.26b) 


(3.27) 


a = y — bx 


(3.28a) 


Substituting Eq. (3.28a) into Eq. (3.26b) and solving for b yields after some algebra 


b = 


Eli WfydXi - x) 
ELi w i A Xi{Xi - x) 


(3.28b) 


Note that Eqs. (3.28) are similar to Eqs. (3.19) for unweighted data. 


Fitting exponential functions 

A special application of weighted linear regression arises in fitting exponential func¬ 
tions to data. Consider as an example the fitting function 

fix) = ae bx 

Normally, the least-squares fit would lead to equations that are nonlinear in a and b. 
But if we fit In y rather than y, the problem is transformed to linear regression: fit the 
function 


F (x) = In fix) = In a + bx 

to the data points (x ; -, In yO,i = 1,2,..., n. This simplification comes at a price: least- 
squares fit to the logarithm of the data is not the same as least-squares fit to the original 
data. The residuals of the logarithmic fit are 

Rj = In yi — FiXi) = In y; — In a - bxt (3.29a) 

whereas the residuals used in fitting the original data are 

r i = yi - f&i) = yi - aebx ‘ 


(3.29b) 
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This discrepancy can be largely eliminated by weighting the logarithmic fit. We 
note from Eq. (3.29b) that ln(r; — y,-) = \n(ae bx ‘) = Inn + bx,, so that Eq. (3.29a) can 
be written as 

Ri = Iny, - ln(r, - y ; ) = In ^1 - y^j 

If the residuals r,; are sufficiently small (r, << y,), we can use the approximation ln(l — 
n/yi ) ~ n/yt, so that 


Ri ~ n/yi 

We can now see that by minimizing Y Rf: we inadvertently introduced the weights 
1/y;. This effect can be negated if we apply the weights y,- when fitting F{x) to 
(In y,-, Xi); that is, by minimizing 



(3.30) 


Other examples that also benefit from the weights W* = y, are given in Table 3.3. 


fix) 

Fix) 

Data to be fitted by F (x) 

axe hx 

In [fix)/x\ = In a + bx 

| Xi, ln(y,7x,:)l 

ax h 

In fix) = In a + MnM 

(In +, In y,) 


Table 3.3 


EXAMPLE 3.8 

Fit a straight line to the data shown and compute the standard deviation. 


X 

0.0 

1.0 

2.0 

2.5 

3.0 

y 

2.9 

3.7 

4.1 

4.4 

5.0 


Solution The averages of the data are 

lv, 0.0+1.0+ 2.0+ 2.5+ 3.0 
X = - > Xi = - 


= 1.7 


y=lY,y‘ = 


2.9+ 3.7+ 4.1 + 4.4+ 5.0 


= 4.02 


5 
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The intercept a and slope b of the interpolant can now be determined from Eq. (3.19): 


b 


a 


J2yi(xi-x) 

- x) 


2.9(-1.7) + 3.71-0.7) + 4.110.3) + 4.4(0.8) + 5.0(1.3) 
0.01-1.7) + 1.01-0.7) + 2.010.3) + 2.5(0.8) + 3.0(1.3) 
3.73 

= 0. 6431 


5.8 


y - xb= 4.02 - 1.7(0.6431) = 2.927 


Therefore, the regression line is f{x) = 2.927 + 0.6431 x, which is shown in the figure 
together with the data points. 



We start the evaluation of the standard deviation by computing the residuals: 


y 

2.900 

3.700 

4.100 

4.400 

5.000 

fM 

2.927 

3.570 

4.213 

4.535 

4.856 

y - /w 

-0.027 

0.130 

-0.113 

-0.135 

0.144 


The sum of the squares of the residuals is 

s = J2 

= (-0.027) 2 + (0.130) 2 + (-0.113) 2 + (-0.135) 2 + (0.144) 2 = 0.06936 


so that the standard deviation in Eq. (3.15) becomes 


S 

n— m 


0.06936 

5-2 


a 


= 0.1520 
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EXAMPLE 3.9 

Determine the parameters a and b so that /( x) = ae bx fits the following data in the 
least-squares sense. 


X 

1.2 

2.8 

4.3 

5.4 

6.8 

7.9 

y 

7.5 

16.1 

38.9 

67.0 

146.6 

266.2 


Use two different methods: (1) fit In y,-; and (2) fit In y, with weights 14) = y,-. Compute 
the standard deviation in each case. 


Solution of Part(1) The problem is to fit the function In [ae bx ) = In a + bx to the data 


X 

1.2 

2.8 

4.3 

5.4 

6.8 

7.9 

z=\ny 

2.015 

2.779 

3.661 

4.205 

4.988 

5.584 


We are now dealing with linear regression, where the parameters to be found are 
A = Inn and b. Following the steps in Example 3.8, we get (skipping some of the 
arithmetic details) 

x = - V Xi = 4.733 z = - V Zi = 3.872 
6 z —' 6 L —' 


b _ J^ZjtXj - X ) 

E *!'(*; - x ) 


16.716 

31.153 


0. 5366 


A = z — xb= 1.3323 


Therefore, a = e A = 3.790 and the fitting function becomes fix) = 3.790c 0 - 5366 . The 
plots of f(x) and the data points are shown in the figure. 



Flere is the computation of standard deviation: 


y 

7.50 

16.10 

38.90 

67.00 

146.60 

266.20 

fix) 

7.21 

17.02 

38.07 

68.69 

145.60 

262.72 

y - fix) 

0.29 

-0.92 

0.83 

-1.69 

1.00 

3.48 
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S = J2 - /to)] 2 = 1^.59 


a 


S 


6-2 


2.10 


As pointed out before, this is an approximate solution of the stated problem, since 
we did not fit y,, but In y,. Judging by the plot, the fit seems to be good. 


Solution of Part (2) We again fit In [ae bx ) = In a + bx Lo z= In y, but this time the 
weights Wi = y/ are used. From Eqs. (3.27) the weighted averages of the data are (recall 
that we fit z = In y) 


f= Eyf*- 
Ey? 
f Eyfzi 
Ey? 


737.5 X 10 3 
98.67 x 10 3 


7.474 


528.2 x 10 3 
98.67 x 10 3 


5.353 


and Eqs. (3.28) yield for the parameters 


b= E y?zj{xi-x) 

E Tito to - x) 


35.39 x 10 3 
65.05 x 10 3 


0.5440 


In a = z - bx = 5.353 - 0.5440(7.474) = 1.287 


Therefore, 

a = ^ Da = e L287 = 3 . 622 

so that the fitting function is /( x) = 3.622t >0 ' 5 ' |/|0 - 1 '. As expected, this result is somewhat 
different from that obtained in Part (1). 

The computations of the residuals and standard deviation are as follows: 


y 

7.50 

16.10 

38.90 

67.00 

146.60 

266.20 

/to 

6.96 

16.61 

37.56 

68.33 

146.33 

266.20 

y - /to 

0.54 

-0.51 

1.34 

-1.33 

0.267 

0.00 


S = I] to-/to)] 2 = 4.186 


a = 



2 


1.023 


Observe that the residuals and standard deviation are smaller than in Part (1), indi¬ 
cating a better fit, as expected. 
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It can be shown that fitting y,- directly (which involves the solution of a transcen¬ 
dental equation) results in f(x) = 3.614e° 5442 *. The corresponding standard deviation 
is a = 1.022, which is very close to the result in Part (2). 

EXAMPLE 3.10 

Write a program that fits a polynomial of arbitrary degree 1c to the data points shown 
below. Use the program to determine k that best fits this data in the least-squares 
sense. 


X 

-0.04 

0.93 

1.95 

2.90 

3.83 

5.00 

y 

-8.66 

-6.44 

-4.36 

-3.27 

-0.88 

0.87 

X 

5.98 

7.05 

8.21 

9.08 

10.09 


y 

3.31 

4.63 

6.19 

7.40 

8.85 



Solution The following program prompts for k. Execution is terminated by pressing 
“return.” 

% Example 3.10 (Polynomial curve fitting) 
xData = [-0.04,0.93,1.95,2.90,3.83,5.0,... 

5.98,7.05,8.21,9.08,10.09]’; 
yData = [-8.66,-6.44,-4.36,-3.27,-0.88,0.87,... 

3.31,4.63,6.19,7.4,8.85] ’ ; 

format short e 
while 1 

k = input(’degree of polynomial = ’); 
if isempty(k) % Loop is terminated 

fprintf(’Done’) % by pressing ’’return’’ 

break 

end 

coeff = polynFit(xData,yData,k+1) 
sigma = stdDev(coeff,xData,yData) 
fprintf(’\n’) 

end 


The results are: 

Degree of polynomial = 1 
coeff = 


1.7286e+000 
-7.9453e+000 
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sigma = 

5.1128e-001 


degree of polynomial = 2 
coeff = 

-4.1971e-002 
2.1512e+000 
-8.5 701e+000 
sigma = 

3.1099e-001 


degree of polynomial = 3 
coeff = 

-2.9852e-003 
2.8845e-003 
1.9810e+000 
-8.4660e+000 
sigma = 

3.1948e-001 


degree of polynomial = 

Done 

Because the quadratic f[x) = — 0.041971x 2 + 2.1512 jc — 8.5701 produces the 
smallest standard deviation, it can be considered as the “best” fit to the data. But be 
warned—the standard deviation is not an infallible measure of the goodness-of-fit. ft is 
always a good idea to plot the data points and /(x) before final determination is made. 
The plot of our data indicates that the quadratic (solid line) is indeed a reasonable 
choice for the fitting function. 
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PROBLEM SET 3.2 

Instructions Plot the data points and the fitting function whenever appropriate. 

1. Show that the straight line obtained by least- squares fit of unweighted data always 
passes through the point (x, y). 

2. Use linear regression to find the line that fits the data 


X 

-1.0 

-0.5 

0 

0.5 

1.0 

y 

-1.00 

-0.55 

0.00 

0.45 

1.00 


and determine the standard deviation. 

3. Three tensile tests were carried out on an aluminum bar. In each test the strain 
was measured at the same values of stress. The results were 


Stress (MPa) 

34.5 

69.0 

103.5 

138.0 

Strain (Test 1) 

0.46 

0.95 

1.48 

1.93 

Strain (Test 2) 

0.34 

1.02 

1.51 

2.09 

Strain (Test 3) 

0.73 

1.10 

1.62 

2.12 


where the units of strain are mm/m. Use linear regression to estimate the modulus 
of elasticity of the bar (modulus of elasticity = stress/strain). 

4. Solve Prob. 3 assuming that the third test was performed on an inferior machine, 
so that its results carry only half the weight of the other two tests. 

5. ■ Fit a straight line to the following data and compute the standard deviation. 


X 

0 

0.5 

1 

1.5 

2 

2.5 

y 

3.076 

2.810 

2.588 

2.297 

1.981 

1.912 

X 

3 

3.5 

4 

4.5 

5 


y 

1.653 

1.478 

1.399 

1.018 

0.794 



6. ■ The table displays the mass M and average fuel consumption <j> of motor vehicles 
manufactured by Ford and Flonda in 1999. Fit a straight line </> = a+ bM to the 
data and compute the standard deviation. 
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Model 

AT (kg) 

(p (km/liter) 

Contour 

1310 

10.2 

Crown Victoria 

1810 

8.1 

Escort 

1175 

11.9 

Expedition 

2360 

5.5 

Explorer 

1960 

6.8 

F-150 

2020 

6.8 

Ranger 

1755 

7.7 

Taurus 

1595 

8.9 

Accord 

1470 

9.8 

CR-V 

1430 

10.2 

Civic 

1110 

13.2 

Passport 

1785 

7.7 


7. ■ The relative density p of air was measured at various altitudes h. The results 
were: 


h (km) 

0 

1.525 

3.050 

4.575 

6.10 

7.625 

9.150 

P 

1 

0.8617 

0.7385 

0.6292 

0.5328 

0.4481 

0.3741 


Use a quadratic least-squares fit to determine the relative air density at h = 10.5 
km. (This problem was solved by interpolation in Prob. 20, Problem Set 3.1.) 

8. ■ Kinematic viscosity /r k of water varies with temperature T as shown in the 
table. Determine the cubic that best fits the data, and use it to compute /x k at 
T = 10°, 30°, 60°, and90°C. (This problem was solved in Prob. 19, Problem Set 3.1 
by interpolation.) 


r(°C) 

0 

21.1 

37.8 

54.4 

71.1 

87.8 

100 

/x k (10“ 3 m 2 /s) 

1.79 

1.13 

0.696 

0.519 

0.338 

0.321 

0.296 


9. ■ Fit a straight line and a quadratic to the data 


X 

1.0 

2.5 

3.5 

4.0 

1.1 

1.8 

2.2 

3.7 

y 

6.008 

15.722 

27.130 

33.772 

5.257 

9.549 

11.098 

28.828 


Which is a better fit? 
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10. ■ The table displays thermal efficiencies of some early steam engines. 6 Determine 
the polynomial that provides the best fit to the data and use it to predict the thermal 
efficiency in the year 2000. 


Year 

Efficiency (%) 

Type 

1718 

0.5 

Newcomen 

1767 

0.8 

Smeaton 

1774 

1.4 

Smeaton 

1775 

2.7 

Watt 

1792 

4.5 

Watt 

1816 

7.5 

Woolf compound 

1828 

12.0 

Improved Cornish 

1834 

17.0 

Improved Cornish 

1878 

17.2 

Corliss compound 

1906 

23.0 

Triple expansion 


11. The table shows the variation of the relative thermal conductivity k of sodium 
with temperature T. Find the quadratic that fits the data in the least-squares 
sense. 


T (°C) 

79 

190 

357 

524 

690 

k 

1.00 

0.932 

0.839 

0.759 

0.693 


12. Let fix) = ax b be the least-squares fit of the data (jtj, y ; ), i = 1,2and let 
Fix) = In a+ Mnx be the least-squares fit of (In X/, In yd—see Table 3.3. Prove 
that Rj ss i'i/yi, where the residuals are n = y; — /(x ; ) and R, = In y ; - — F{Xi). As¬ 
sume that r* << y,-. 

13. Determine a and b for which fix) = asin(jrx/2) + bcosinx/Z) fits the following 
data in the least-squares sense. 


X 

-0.5 

-0.19 

0.02 

0.20 

0.35 

0.50 

y 

-3.558 

-2.874 

-1.995 

-1.040 

-0.068 

0.677 


6 Source: Singer, C., Holmyard, E.J., Hall, A.R., and Williams, T.H., A History of Technology, Oxford 
University Press, 1958. 
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14. Determine a and b so that /( x) = ax b fits the following data in the least-squares 
sense. 


X 

0.5 

1.0 

1.5 

2.0 

2.5 

y 

0.49 

1.60 

3.36 

6.44 

10.16 


15. Fit the function f{x) = axe bx to the data and compute the standard deviation. 


X 

0.5 

1.0 

1.5 

2.0 

2.5 

y 

0.541 

0.398 

0.232 

0.106 

0.052 


16. ■ The intensity of radiation of a radioactive substance was measured at half-year 
intervals. The results were: 


t (years) 

0 

0.5 

1 

1.5 

2 

2.5 

y 

1.000 

0.994 

0.990 

0.985 

0.979 

0.977 

t (years) 

3 

3.5 

4 

4.5 

5 

5.5 

y 

0.972 

0.969 

0.967 

0.960 

0.956 

0.952 


where y is the relative intensity of radiation. Knowing that radioactivity decays 
exponentially with time: y (/) = ae~ bt , estimate the radioactive half-life of the 
substance. 

MATLAB Functions 

y = interpl(xData, xData, x, method) returns the value of the interpolant y at 
point x according to the method specified: method = ’linear’ uses linear 
interpolation between adjacent data points (this is the default); method = 
’ spline ’ carries out cubic spline interpolation. If x is an array, y is computed 
for all elements of x. 

a = polyf it (xData, yData, m) returns the coefficients a of a polynomial of degree 
m that fits the data points in the least-squares sense. 

y = polyval (a, x) evaluates a polynomial defined by its coefficients a at point x. 
if x is an array, y is computed for all elements of x. 

s = std(x) returns the standard deviation of the elements of array x. If x is a matrix, 
s is computed for each column of x. 

xbar = mean(x) computes the mean value of the elements of x. If x is a matrix, 
xbar is computed for each column of x. 
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Linear forms can be fitted to data by setting up the overdetermined equations in 
Eq. (3.22) 


Fa = y 

and solving them with the command a = F\y (recall that for overdetermined equa¬ 
tions the backslash operator returns the least-squares solution). Here is an illustration 
how to fit 


fix) = ai + a 2 e x + a 3 xe x 

to the data in Example 3.9: 

xData = [1.2; 2.8; 4.3; 5.4; 6.8; 7.0]; 
yData = [7.5; 16.1; 38.9; 67.0; 146.6; 266.2]; 
F = ones(length(xData),3) ; 

F(:,2) = exp(xData(:)); 

F(:,3) = xData(:).*exp(-xData(:)) ; 
a = F\yData 
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Roots of Equations 


Find the solutions of fix) = 0, where the function / is given 


4.1 Introduction 

A common problem encountered in engineering analysis is this: given a function fix ), 
determine the values of x for which /( x) = 0. The solutions (values of x) are known as 
the wots of the equation fix) = 0, or the zewes of the function fix). 

Before proceeding further, it might be helpful to review the concept of a function. 
The equation 

y = fix) 

contains three elements: an input value x, an output value y and the rule / for comput¬ 
ing y. The function is said to be given if the rule / is specified. In numerical computing 
the rule is invariably a computer algorithm. It may be a function statement, such as 

fix) = cosh(x) cos(x) - 1 

or a complex procedure containing hundreds or thousands of lines of code. As long 
as the algorithm produces an output y for each input x, it qualifies as a function. 

The roots of equations may be real or complex. The complex roots are seldom 
computed, since they rarely have physical significance. An exception is the polynomial 
equation 

aix n + a 2 x n ~ l -\ - 1 - a n x + a n+ \ = 0 

where the complex roots may be meaningful (as in the analysis of damped vibrations, 
for example). For the time being, we will concentrate on finding the real roots of 
equations. Complex zeroes of polynomials are treated near the end of this chapter. 
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Roots of Equations 


In general, an equation may have any number of (real) roots, or no roots at all. 
For example, 


sinx — x = 0 

has a single root, namely x = 0, whereas 

tan x - x = 0 

has an infinite number of roots (x = 0, ±4.493, ±7.725,...). 

All methods of finding roots are iterative procedures that require a starting point, 
i.e., an estimate of the root. This estimate can be crucial; a bad starting value may 
fail to converge, or it may converge to the “wrong” root (a root different from the one 
sought). There is no universal recipe for estimating the value of a root. If the equation is 
associated with a physical problem, then the context of the problem (physical insight) 
might suggest the approximate location of the root. Otherwise, the function must be 
plotted, or a systematic numerical search for the roots can be carried out. One such 
search method is described in the next article. 

It is highly advisable to go a step further and bracket the root (determine its lower 
and upper bounds) before passing the problem to a root-finding algorithm. Prior 
bracketing is, in fact, mandatory in the methods described in this chapter. 


4.2 Incremental Search Method 

The approximate locations of the roots are best determined by plotting the function. 
Often a very rough plot, based on afewpoints, is sufficient to give us reasonable starting 
values. Another useful tool for detecting and bracketing roots is the incremental search 
method. It can also be adapted for computing roots, but the effort would not be 
worthwhile, since other methods described in this chapter are more efficient for that. 

The basic idea behind the incremental search method is simple: if f(x\) and 
fix-z) have opposite signs, then there is at least one root in the interval (x\. X 2 ). If the 
interval is small enough, it is likely to contain a single root. Thus the zeroes of /(x) can 
be detected by evaluating the function at intervals Ax and looking for change in sign. 
There are several potential problems with the incremental search method: 

• It is possible to miss two closely spaced roots if the search increment Ax is larger 
than the spacing of the roots. 

• A double root (two roots that coincide) will not be detected. 

• Certain singularities of /(x) can be mistaken for roots. For example, /(x) = tan x 
changes sign at x = ±|njr, n = 1,3, 5,..., as shown in Fig. 4.1. However, these 
locations are not true zeroes, since the function does not cross the x-axis. 
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■ rootsearch 

The function rootsearch looks for a zero of the function /(x) in the interval (a, b). 
The search starts at a and proceeds in steps dx toward b. Once a zero is detected, 
rootsearch returns its bounds (xl,x2) to the calling program. If a root was not 
detected, xl = x2 = NaN is returned (in MATLAB NaN stands for “not a number”). 
After the first root (the root closest to a) has been bracketed, rootsearch can be 
called again with a replaced by x2 in order to find the next root. This can be repeated 
as long as rootsearch detects a root. 

function [xl,x2] = rootsearch(func,a,b,dx) 

% Incremental search for a root of f(x). 

% USAGE: [xl,x2] = rootsearch(func,a,d,dx) 

% INPUT: 

% func = handle of function that returns f(x). 

% a,b = limits of search. 

% dx = search increment. 

% OUTPUT: 

% xl,x2 = bounds on the smallest root in (a,b); 

% set to NaN if no root was detected 

xl = a; fl = fevalffunc,xl); 

x2 = a + dx; f2 = fevalffunc,x2); 
while fl*f2 > 0.0 
if xl >= b 

xl = NaN; x2 = NaN; return 

end 

xl = x2; fl = f2; 

x2 = xl + dx; f2 = fevalffunc,x2); 


end 
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EXAMPLE 4.1 

Use incremental search with Ax = 0.2 to bracket the smallest positive zero of 
/(x) = x 3 — 10x 2 + 5. 

Solution We evaluate /(x) at intervals Ax = 0.2, staring at x = 0, until the function 
changes its sign (value of the function is of no interest to us; only its sign is relevant). 
This procedure yields the following results: 


X 

fix) 

0.0 

5.000 

0.2 

4.608 

0.4 

3.464 

0.6 

1.616 

0.8 

-0.888 


From the sign change of the function we conclude that the smallest positive zero lies 
between x = 0.6 and x = 0 . 8 . 


4.3 Method of Bisection 

After a root of /(x) = 0 has been bracketed in the interval (xi, X 2 ), several methods can 
be used to close in on it. The method of bisection accomplishes this by successively 
halving the interval until it becomes sufficiently small. This technique is also known 
as the interval halving method. Bisection is not the fastest method available for com¬ 
puting roots, but it is the most reliable. Once a root has been bracketed, bisection will 
always close in on it. 

The method of bisection uses the same principle as incremental search: if there is 
a root in the interval (xi, X 2 ), then /(x 1 ) • /(x 2 ) < 0. In order to halve the interval, we 
compute /(x 3 ), where X 3 = | (xi + x 2 ) is the midpoint of the interval. If /(x 2 ) • /(x 3 ) < 
0 , then the root must be in (x 2 , x 3 ) and we record this by replacing the original bound 
X\ by x 3 . Otherwise, the root lies in (xi, x 3 ), in which case x 2 is replaced by x 3 . In either 
case, the new interval (xi, x 2 ) is half the size of the original interval. The bisection is 
repeated until the interval has been reduced to a small value e, so that 

\X 2 -Xi| < £ 

It is easy to compute the number of bisections required to reach a prescribed 
£. The original interval Ax is reduced to Ax/2 after one bisection, Ax/2 2 after 
two bisections and after n bisections it is Ax/2". Setting Ax/2" = e and solving 
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for n, we get 


In (| Ax| /e) 
In 2 


(4.1) 


■ bisect 

This function uses the method of bisection to compute the root of fix ) = 0 that is 
known to lie in the interval (xl, x 2 ). The number of bisections n required to reduce 
the interval to tol is computed from Eq. (4.1). The input argument filter controls 
the filtering of suspected singularities. By setting filter = l, we force the routine 
to check whether the magnitude of fix) decreases with each interval halving. If it does 
not, the “root” may not be a root at all, but a singularity, in which case root = NaN is 
returned. Since this feature is not always desirable, the default value is filter = 0 . 

function root = bisect(func,xl,x2,filter,tol) 

% Finds a bracketed zero of f(x) by bisection. 

% USAGE: root = bisect(func,xl,x2,filter,tol) 

% INPUT: 

% func = handle of function that returns f(x). 

% xl,x2 = limits on interval containing the root. 

% filter = singularity filter: 0 = off (default), 1 = on. 

% tol = error tolerance (default is 1.0e4*eps). 

% OUTPUT: 

% root = zero of f(x), or NaN if singularity suspected. 

if nargin < 5; tol = 1.0e4*eps; end 
if nargin < 4; filter = 0; end 
fl = feval(func,xl); 

if fl == 0.0; root = xl; return; end 
f2 = feval(func,x2); 

if f2 == 0.0; root = x2; return; end 
if fl*f2 > 0; 

error('Root is not bracketed in (xl,x2)’) 

end 

n = ceil(log(abs(x2 - xl)/tol)/log(2.0)); 
for i = l:n 

x3 = 0.5*(xl + x2); 
f3 = feval(func,x3) ; 

if(filter == 1) & (abs(f3) > abs(fl))... 

& (abs(f3) > abs(f2)) 
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root = NaN; return 

end 

if f3 == 0.0 


root = x3; return 

end 

if f2*f3 <0.0 

xl = x3; fl = f3; 

else 

x2 = x3; f2 = f3; 

end 

end 

root=(xl + x2)/2; 


EXAMPLE 4.2 

Use bisection to find the root of fix) = x 3 — 10x 2 + 5 = 0 that lies in the interval 

( 0 . 6 , 0 . 8 ). 

Solution The best way to implement the method is to use the table shown below. Note 
that the interval to be bisected is determined by the sign of fix), not its magnitude. 


X 

/« 

Interval 

0.6 

1.616 

— 

0.8 

- 0.888 

( 0 . 6 , 0 . 8 ) 

(0.6 + 0.8)/2 = 0.7 

0.443 

(0.7, 0.8) 

(0.8+ 0.7)/2 = 0.75 

-0.203 

(0.7, 0.75) 

(0.7 + 0.75)/2 = 0.725 

0.125 

(0.725, 0.75) 

(0.75 + 0.725)/2 = 0.7375 

-0.038 

(0.725, 0.7375) 

(0.725 + 0.7375)/2 = 0.73125 

0.044 

(0.7375, 0.73125) 

(0.7375 + 0.73125)/2 = 0.73438 

0.003 

(0.7375, 0.73438) 

(0.7375 + 0.73438)/2 = 0.73594 

-0.017 

(0.73438, 0.73594) 

(0.73438 + 0.73594)/2 = 0.73516 

-0.007 

(0.73438, 0.73516) 

(0.73438 + 0.73516)/2 = 0.73477 

- 0.002 

(0.73438, 0.73477) 

(0.73438 + 0.73477)/2 = 0.73458 

0.000 

- 


The final result x = 0.7346 is correct within four decimal places. 
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EXAMPLE 4.3 

Find all the zeroes of /(x) = x — tan x in the interval (0, 20) by the method of bisection. 
Utilize the functions root search and bisect. 

Solution Note that tan x is singular and changes sign at x = 7r/2 , 3tt/ 2,_To prevent 

bisect from mistaking these point for roots, we set f il t er = l. The closeness of roots 
to the singularities is another potential problem that can be alleviated by using small 
Ax in rootsearch. Choosing Ax = 0.01, we arrive at the following program: 

% Example 4.3 (root finding with bisection) 
a = 0.0; b = 20.0; dx = 0.01; 
nroots = 0; 
while 1 

[xl,x2] = rootsearch(@fex4_3,a,b,dx); 
if isnan(xl) 
break 

else 

a = x2 ; 

x = bisect(@fex4_3,xl,x2,1) ; 
if 'isnan(x) 

nroots = nroots + 1; 
root(nroots) = x; 

end 

end 

end 

root 

Recall that in MATLAB the symbol @ before a function name creates a handle for 
the function. Thus the input argument @fex4_3 in rootsearch is a handle for the 
function f ex4_3 listed below. 

function y = fex4_3(x) 

% Function used in Example4.3 
y = x - tan(x); 

Running the program resulted in the output 

» root = 

0 4.4934 7.7253 10.9041 14.0662 17.2208 
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4.4 Brent's Method 


Brent’s method 7 combines bisection and quadratic interpolation into an efficient 
root-finding algorithm. In most problems the method is much faster than bisection 
alone, but it can become sluggish if the function is not smooth. It is the recommended 
method of root finding if the derivative of the function is difficult or impossible to 
compute. 



(b) Case of x>x : 


(a) Case of x<x 3 

Figure 4.2. Inverse quadratic iteration. 


Brent’s method assumes that a root of /(x) = 0 has been initially bracketed in 
the interval (xi, x 2 ). The root-finding process starts with a bisection step that halves 
the interval to either (xi, X 3 ) or (x 3 , x 2 ), where x 3 = (x 3 + x 2 )/2, as shown in Figs. 4.2(a) 
and (b). In the course of bisection we had to compute /1 = /(x 1 ), / 2 = /(x 2 ) and 
f 3 = /(x 3 ), so that we now know three points on the /(x) curve (the open circles in 
the figure). These points allow us to carry out the next iteration of the root by inverse 
quadratic interpolation (viewing x as a quadratic function of /). If the result x of the 
interpolation falls inside the latest bracket (as is the case in Figs. 4.2), we accept the 
result. Otherwise, another round of bisection is applied. 



(a) 

Figure 4.3. Relabeling points after an iteration. 


(b) 


The next step is to relabel x as x 3 and rename the limits of the new interval 
Xi and x 2 (xi < x 3 < x 2 ), as indicated in Figs. 4.3. We have now recovered the orig¬ 
inal sequencing of points in Figs. 4.2, but the interval (x 3 , x 2 ) containing the root 


7 Brent, R. P., Algorithms for Minimization without Derivatives, Prentice-Hall, 1973. 
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has been reduced. This completes the first iteration cycle. In the next cycle an¬ 
other inverse quadratic interpolation is attempted and the process is repeated un¬ 
til the convergence criterion \x — X 3 I < e is satisfied, where e is a prescribed error 
tolerance. 

The inverse quadratic interpolation is carried out with Lagrange’s three-point 
interpolant described in Art. 3.2. Interchanging the roles of x and /, we have 


*(/) 


(/-/2X/-/3) (/-/i)(/~/ 3 ) /z) 

C/i - / 2 K /1 -fa) (/ 2 - / 1 K /2 - fa) ifa - fiUfa ~ fa) 3 


Setting / = 0 and simplifying, we obtain for the estimate of the root 


x = x(0) 


fafa x \{fa ~ fa) + faf\Xa{fa — / 1 ) + f\fa x a{fi — fa) 
{fi ~ fa) {fa ~ / 3 H /3 - fi) 


The change in the root is 


A x = x-x 3 = f 3 


x a{fi ~ fa){fa - fa + fi) + fa%\{fa - /3) + f\%a{fa - fi) 
{fa ~ / 1 H /3 - fi){fa ~ fa) 


(4.2) 


■ brent 

The function brent listed below is a simplified version of the algorithm proposed by 
Brent. It omits some of Brent’s safeguards against slow convergence; it also uses a less 
sophisticated convergence criterion. 

function root = brent(func,a,b,tol) 

% Finds a root of f(x) = 0 by combining quadratic 
% interpolation with bisection (Brent’s method). 

% USAGE: root = brent(func,a,b,tol) 

% INPUT: 

% func = handle of function that returns f(x). 

% a,b = limits of the interval containing the root. 

% tol = error tolerance (default is 1 . 0 e 6 *eps). 

% OUTPUT: 

% root = zero of f(x) (root = NaN if failed to converge). 

if nargin < 4; tol = 1.0e6*eps; end 

% First step is bisection 

xl = a; fl = feval(func,xl); 

if fl == 0 ; root = xl; return; end 

x 2 = b; f 2 = feval(func,x 2 ); 

if f 2 == 0 ; root = x 2 ; return; end 
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if fl*f2 >0.0 

error('Root is not bracketed in (a,b)’) 

end 

x3 = 0.5*(a + b); 

% Beginning of iterative loop, 
for i = 1:30 

f3 = feval(func,x3) ; 
if abs(f3) < tol 

root = x3; return 

end 

% Tighten brackets (a,b) on the root, 
if fl*f3 < 0.0; b = x3; 
else; a = x3; 
end 


if (b - a) < tol*max(abs(b),1.0) 
root = 0.5*(a + b); return 


end 

% Try quadratic interpolation. 

denom = (f2 - fl)*(f3 - fl)*(f2 - f3); 

numer = x3*(fl - f2)*(f2 - f3 + fl) . . . 

+ f2*xl*(f2 - f3) + fl*x2*(f3 - fl); 

% If division by zero, push x out of bracket 

% to force bisection. 

if denom == 0; dx = b - a; 

else; dx = f3*numer/denom; 

end 

x = x3 + dx; 

% If interpolation goes out of bracket, use bisection, 
if (b - x)*(x - a) < 0.0 

dx = 0.5*(b - a); x = a + dx; 


end 

% Let x3 <-- x & choose new xl, x2 so that xl < x3 < x2. 
if x < x3 

x2 = x3; f2 = f3; 


else 


xl = x3; fl = f3; 

end 

x3 = x; 


end 

root = NaN; 




4.4 Brent's Method 


EXAMPLE 4.4 

Determine the root of f{x ) = x 3 — 10x 2 + 5 = 0 that lies in (0.6, 0.8) with Brent’s 
method. 

Solution 

Bisection The starting points are 

X\ = 0.6 /i = 0.6 3 - 10(0.6) 2 + 5 = 1.616 

x 2 = 0.8 f 2 = 0.8 3 - 10(0.8) 2 + 5 = -0.888 

Bisection yields the point 

*3 = 0.7 / 3 = 0.7 3 - 10(0.7) 2 + 5 = 0.443 

Byinspectingthe signs of/weconcludethatthenewbracketsontherootare (x 3 , x 2 ) = 
(0.7, 0.8). 

First interpolation cycle Substituting the above values of x and / into the numer¬ 
ator of the quotient in Eq. (4.2), we get 

num = x 3 (/i - f z Uf 2 - f 3 + /i) + f 2 x [ (f 2 - / 3 ) + /iX 2 (/ 3 - /i) 

= 0.7(1.616 + 0.888)(—0.888 - 0.443 + 1.616) 

—0.888(0.6)(—0.888 - 0.443) + 1.616(0.8)(0.443 - 1.616) 

= -0.307 75 

and the denominator becomes 

den = (/ 2 - /i)(/ 3 - /i)(/ 2 - / 3 ) 

= (-0.888 - 1.616K0.443 - 1.616)(—0.888 - 0.443) = -3.9094 

Therefore, 


and 


„ num (-0.307 75) 

A x= f 3 -= 0.443 , = 0.034 87 


den 


(-3.9094) 


x = x 3 + Ax=0.7 + 0.034 87 = 0.734 87 
Since the result is within the established brackets, we accept it. 

Relabel points As jc > x 3 , the points are relabeled as illustrated in Figs. 4.2(b) and 
4.3(b): 
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x 3 <— x — 0.734 87 

/ 3 = 0.734 87 3 - 10(0.734 87) 2 + 5 = -0.00348 
The new brackets on the root are ix \, x 3 ) = (0.7, 0.734 87). 

Second interpolation cycle Applying the interpolation in Eq. (4.2) again, we obtain 
(skipping the arithmetical details) 


Ax = -0.00027 

x = x 3 + Ax = 0.734 87 - 0.000 27 = 0.734 60 

Again x falls within the latest brackets, so the result is acceptable. At this stage, x is 
correct to five decimal places. 

EXAMPLE 4.5 

Compute the zero of 

fix) = x |cosx| - 1 

that lies in the interval (0, 4) with Brent’s method. 

Solution 



The plot of fix) shows that this is a rather nasty function within the specified interval, 
containing a slope discontinuity and two local maxima. The sensible approach is 
to avoid the potentially troublesome regions of the function by bracketing the root 
as tightly as possible from a visual inspection of the plot. In this case, the interval 
ia, b) = (2.0, 2.2) would be a good starting point for Brent’s algorithm. 
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Is Brent's method robust enough to handle the problem with the original brackets 
(0,4)? Well, here is the MATLAB command and its output: 

» brent(@fex4_5,0.0,4.0) 
ans = 

2.0739 

The result was obtained after six iterations. The function defining /(x) is 

function y = fex4_5(x) 

% Function used in Example 4.5 
y = x*abs(cos(x)) - 1.0; 


4.5 Newton-Raphson Method 


The Newton-Raphson algorithm is the best-known method of finding roots for a 
good reason: it is simple and fast. The only drawback of the method is that it uses 
the derivative /'(;e) of the function as well as the function f{x) itself. Therefore, the 
Newton-Raphson method is usable only in problems where /' (x) can be readily com¬ 
puted. 

The Newton-Raphson formula can be derived from the Taylor series expansion 
of /(x) about x: 

/(X/+1) = /(X;) + f'{Xi){X i+ \ - Xi) + 0(x i+ 1 - xO 2 (a) 

If X; + i is a root of /(x) = 0, Eq. (a) becomes 

0 = f{Xi) + f'{Xi) (x i+ i - Xi) + 0(x !+ i - x^ 2 (b) 


Assuming that x ; - is a close to x; + i, we can drop the last term in Eq. (b) and solve for 
Xi + \. The result is the Newton-Raphson formula 

x i+ i = Xi - 4^: (4.3) 

f'(Xi) 

If x denotes the true value of the root, the error in x, is Ei = x — x,\ It can be shown 
that if Xi + 1 is computed from Eq. (4.3), the corresponding error is 


Ei+i = — 


rte) E 2 

2 /'(x ; ) ! 


indicating that the Newton-Raphson method converges quadratically (the error is the 
square of the error in the previous step). As a consequence, the number of significant 
figures is roughly doubled in every iteration, provided that x,- is close to the root. 
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Tangent line 


Figure 4.4. Graphical interpretation of the Newton-Raphson 
formula. 


x 


A graphical depiction of the Newton-Raphson formula is shown in Fig. 4.4. The for¬ 
mula approximates fix) by the straight line that is tangent to the curve at x*. 'I’ll us x ,- + 1 
is at the intersection of the x-axis and the tangent line. 

The algorithm for the Newton-Raphson method is simple: it repeatedly applies 
Eq. (4.3), starting with an initial value x 0 , until the convergence criterion 


|X i+ i - Xi| < E 


is reached, e being the error tolerance. Only the latest value of x has to be stored. Here 
is the algorithm: 

1. Let x be a guess for the root of fix) = 0. 

2. Compute Ax = - fix)/fix). 

3. Let x<-x+Ax and repeat steps 2-3 until | Ax| < e. 


fix) 


ft v \ 



Figure 4.5. Examples where the Newton-Raphson method 
diverges. 


Although the Newton-Raphson method converges fast near the root, its global 
convergence characteristics are poor. The reason is that the tangent line is not al¬ 
ways an acceptable approximation of the function, as illustrated in the two examples 
in Fig. 4.5. But the method can be made nearly fail-safe by combining it with bisection, 
as in Brent’s method. 

■ newtonRaphson 

The following safe version of the Newton-Raphson method assumes that the root 
to be computed is initially bracketed in (a, b). The midpoint of the bracket is used 
as the initial guess of the root. The brackets are updated after each iteration. If a 
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Newton-Raphson iteration does not stay within the brackets, it is disregarded and 
replaced with bisection. Since newtonRaphson uses the function f (x) as well as its 
derivative, function routines for both (denoted by func and df unc in the listing) must 
be provided by the user. 

function root = newtonRaphson(func,dfunc,a,b,tol) 

% Newton-Raphson method combined with bisection for 
% finding a root of f(x) = 0. 

% USAGE: root = newtonRaphson(func,dfunc,a,b,tol) 

% INPUT: 

% func = handle of function that returns f(x). 

% dfunc = handle of function that returns f’(x). 

% a,b = brackets (limits) of the root. 

% tol = error tolerance (default is 1.0e6*eps). 

% OUTPUT: 

% root = zero of f(x) (root = NaN if no convergence). 

if nargin < 5; tol = 1.0e6*eps; end 
fa = feval(func,a); fb = feval(func,b); 
if fa == 0; root = a; return; end 
if fb == 0; root = b; return; end 
if fa*fb > 0.0 

error('Root is not bracketed in (a,b)’) 

end 

x = (a + b)/2.0; 
for i = 1:30 

fx = feval(func,x); 

if abs(fx) < tol; root = x; return; end 

% Tighten brackets on the root 

if fa*fx < 0.0; b = x; 

else; a = x; 

end 

% Try Newton--Raphson step 
dfx = feval(dfunc,x); 
if abs(dfx) == 0; dx = b - a; 
else; dx = -fx/dfx; 
end 

x = x + dx; 

% If x not in bracket, use bisection 
if (b - x)*(x - a) < 0.0 
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dx = (b - a)/2.0; 
x = a + dx; 

end 

% Check for convergence 
if abs(dx) < tol*max(b,1.0) 
root = x; return 

end 

end 

root = NaN 


EXAMPLE 4.6 

A root of fix) = x 3 — 10x 2 + 5 = 0 lies close to x = 0.7. Compute this root with the 
Newton-Raphson method. 


Solution The derivative of the function is /'(x) = 3x 2 — 20x, so that the Newton- 
Raphson formula in Eq. (4.3) is 


fix) x 3 - 10x 2 + 5 2x 3 — 10x 2 - 5 

^ _ __ — _ _ — _ 

fix) 3x 2 — 20x x (3x - 20) 

It takes only two iterations to reach five decimal place accuracy: 


x 


2(0.7) 3 - 10(0.7) 2 - 5 
0.7 [3(0.7) -20] 


0.735 36 


2(0.735 36) 3 - 10(0.735 36) 2 - 5 
0.735 36 [3(0.735 36) - 20] 


0.734 60 


EXAMPLE 4.7 

Find the smallest positive zero of 


Solution 


fix ) = x 4 - 6.4x 3 + 6.45x 2 + 20.538x - 31.752 
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Inspecting the plot of the function, we suspect that the smallest positive zero is a 
double root near x = 2. Bisection and Brent’s method would not work here, since they 
depend on the function changing its sign at the root. The same argument applies to 
the function newtonRaphson. But there no reason why the unrefined version of the 
Newton-Raphson method should not succeed. We used the following program, which 
prints the number of iterations in addition to the root: 

function [root,numlter] = newton_simple(func,dfunc,x,tol) 

% Simple version of Newton-Raphson method used in Example 4.7. 

if nargin < 5; tol = 1.0e6*eps; end 
for i = 1:30 

dx = -fevalffunc,x)/feval(dfunc,x); 
x = x + dx; 
if abs(dx) < tol 

root = x; numlter = i; return 

end 

end 

root = NaN 

The two functions called by the program are 

function y = fex4_7(x) 

% Function used in Example 4.7. 

y = x"4 - 6.4*x~3 + 6.45*x“2 + 20.538-x - 31.752; 

function y = dfex4_7(x) 

% Function used in Example 4.7. 
y = 4.0*x~3 - 19.2*x"2 + 12.9*x + 20.538; 

Here are the results: 

» [root,numlter] = newton.simple(@fex4_7,@dfex4_7,2.0) 
root = 

2.1000 
numlter = 

27 

It can be shown that near a multiple root the convergence of the Newton-Raphson 
method is linear, rather than quadratic, which explains the large number of iterations. 
Convergence to a multiple root can be speeded up byreplacing the Newton-Raphson 
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formula in Eq. (4.3) with 


Jtj+i = Xi — 171 

where m is the multiplicity of the root (m = 2 in this problem). After making the 
change in the above program, we obtained the result in 5 iterations. 


4.6 Systems of Equations 
Introduction 

Up to this point, we confined our attention to solving the single equation f(x) = 0. 
Let us now consider the n-dimensional version of the same problem, namely 

f(x) = 0 


or, using scalar notation 


fl(Xi, x 2 ,..., x n ) = 0 

f 2 {x ] ,x 2 ,..., x n ) = 0 (4.4) 


fn(X 1 , X 2 , . . . , Xn) = 0 

The solution of n simultaneous, nonlinear equations is a much more formidable task 
than finding the root of a single equation. The trouble is the lack of a reliable method for 
bracketing the solution vector x. Therefore, we cannot provide the solution algorithm 
with a guaranteed good starting value of x, unless such a value is suggested by the 
physics of the problem. 

The simplest and the most effective means of computing x is the Newton- 
Raphson method. It works well with simultaneous equations, provided that it is sup¬ 
plied with a good starting point. There are other methods that have better global con¬ 
vergence characteristics, but all of them are variants of the Newton-Raphson method. 


Newton-Raphson Method 

In order to derive the Newton-Raphson method for a system of equations, we start 
with the Taylor series expansion of f\ (x) about the point x: 


/;(x+ Ax) = f(x) + ^ 
j= i 


^-Ax t + 0( Ax 2 ) 

dXj 


(4.5a) 
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Dropping terms of order Ax 2 , we can write Eq. (4.5a) as 


f(x+ Ax) = f(x) + J(x) Ax 


(4.5b) 


where J(x) is the Jacobian matrix (of size n x n) made up of the partial derivatives 



(4.6) 


Note that Eq. (4.5b) is a linear approximation (vector Ax being the variable) of the 
vector-valued function f in the vicinity of point x. 

Let us now assume that x is the current approximation of the solution of f (x) =0, 
andletx+ Axbe the improved solution. To find the correction Ax, we set f(x+ Ax) = 0 
in Eq. (4.5b). The result is a set of linear equations for Ax: 


J(x) Ax = -f(x) 


(4.7) 


The following steps constitute the Newton-Raphson method for simultaneous, 
nonlinear equations: 

1. Estimate the solution vector x. 

2. Evaluate f(x). 

3. Compute the Jacobian matrix J(x) from Eq. (4.6). 

4. Set up the simultaneous equations in Eq. (4.7) and solve for Ax. 

5. Let x <- x + Ax and repeat steps 2-5. 

The above process is continued until | Ax| < s, where e is the error tolerance. As 
in the one-dimensional case, success of the Newton-Raphson procedure depends 
entirely on the initial estimate of x. If a good starting point is used, convergence to the 
solution is very rapid. Otherwise, the results are unpredictable. 

Because analytical derivation of each dfi/dXj can be difficult or impractical, it is 
preferable to let the computer calculate the partial derivatives from the finite differ¬ 
ence approximation 


dfi _ fjjx+ejh) - f (x) 


(4.8) 


h 


where h is a small increment and e 7 represents a unit vector in the direction of Xj. 
This formula can be obtained from Eq. (4.5a) after dropping the terms of order Ax 2 
and setting Ax = ejh. By using the finite difference approximation, we also avoid the 
tedium of typing the expressions for dfi/dXj into the computer code. 

■ newtonRaphson2 


This function is an implementation of the Newton-Raphson method. The nested func¬ 
tion j ac ob ian computes the Jacobian matrix from the finite difference approximation 
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in Eq. (4.8). The simultaneous equations in Eq. (4.7) are solved by using the left di¬ 
vision operator of MATLAB. The function subroutine func that returns the array f(x) 
must be supplied by the user. 

function root = newtonRaphson2(func,x,tol) 

% Newton-Raphson method of finding a root of simultaneous 

% equations fi(xl,x2.xn) = 0, i = 1,2 ,...,n. 

% USAGE: root = newtonRaphson2(func,x,tol) 

% INPUT: 

% func = handle of function that returns[fl,f2 ,...,fn]. 

% x = starting solution vector [xl,x2.xn]. 

% tol = error tolerance (default is 1.0e4*eps). 

% OUTPUT: 

% root = solution vector. 

if nargin == 2; tol = 1.0e4*eps; end 

if size(x,l) == 1; x = x’; end % x must be column vector 
for i = 1:30 

[jac.fO] = jacobian(func,x); 
if sqrt(dot(f0,fO)/length(x)) < tol 
root = x; return 

end 

dx = jac\(-f0); 
x = x + dx; 

if sqrt(dot(dx,dx)/length(x)) < tol*max(abs(x),1.0) 
root = x; return 

end 

end 

error('Too many iterations’) 

function [jac,f0 ] = jacobian(func,x) 

% Returns the Jacobian matrix and f(x). 

h = 1.0e-4; 

n = length(x); 

jac = zeros(n); 

fO = feval(func,x); 

for i =l:n 

temp = x(i); 
x(i) = temp + h; 
fl = feval(func,x); 




163 


4.6 Systems of Equations 


x(i) = temp; 

jac(:,i) = (fl - fO)/h; 

end 

Note that the Jacobian matrix J(x) is recomputed in each iterative loop. Since each 
calculation of J(x) involves n + 1 evaluations of f(x) (n is the number of equations), 
the expense of computation can be high depending on n and the complexity of f(x). 
It is often possible to save computer time by neglecting the changes in the Jacobian 
matrix between iterations, thus computing J(x) only once. This will work provided 
that the initial x is sufficiently close to the solution. 


EXAMPLE 4.8 

Determine the points of intersection between the circle x 2 + y 2 = 3 and the hyper¬ 
bola xy = 1. 

Solution The equations to be solved are 

fi {x,y) = x 2 + y 2 - 3 = 0 (a) 

f 2 {x,y) = xy- 1 = 0 (b) 


The Jacobian matrix is 


Jtx, y) 


dfi/dx dfi/dy 


2x 2 y 

df 2 /dX df 2 /dy 


_y * _ 


Thus the linear equations J(x)Ax=—f(x) associated with the Newton-Raphson 
method are 


2x 2 y 

Ax 


-x 2 - y 2 + 3 

y x 



-xy+1 


By plotting the circle and the hyperbola, we see that there are four points of 
intersection. It is sufficient, however, to find only one of these points, as the others can 
be deduced from symmetry. From the plot we also get a rough estimate of the coordi¬ 
nates of an intersection point: x = 0.5, y = 1.5, which we use as the starting values. 



The computations then proceed as follows. 
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First iteration Substituting x = 0.5, y = 1.5 in Eq. (c), we get 


1.0 

3.o" 

Ax 


"o.50" 

1.5 

0.5 

_Ay_ 


0.25 


the solution of which is Ax = Ay = 0.125. Therefore, the improved coordinates of the 
intersection point are 


x = 0.5 ± 0.125 = 0.625 y = 1.5 + 0.125 = 1.625 

Second iteration Repeating the procedure using the latest values of x and y, we 
obtain 


1.250 

3.250 

Ax 


-0.031250 

1.625 

0.625 

_Ay_ 


-0.015625 


which yields Ax = A y= —0.00694. Thus 

x= 0.625- 0.00694 = 0.618 06 y= 1.625-0.00694= 1.618 06 


Third iteration Substitution of the latest x and y into Eq. (c) yields 


"1.23612 3.23612" 

Ax 


" —0.000 116" 

1.61806 0.61806 

_Ay_ 


-0.000 058 


The solution is Ax = Ay = —0.00003, so that 

x = 0.618 06 - 0.000 03 = 0.618 03 
y= 1.618 06- 0.000 03 = 1.618 03 

Subsequent iterations would not change the results within five significant figures. 
Therefore, the coordinates of the four intersection points are 

±(0.618 03, 1.618 03) and ± (1.618 03, 0.618 03) 


Alternate solution If there are only a few equations, it may be possible to eliminate 
all but one of the unknowns. Then we would be left with a single equation which 
can be solved by the methods described in Arts. 4.2-4.5. In this problem, we obtain 
from Eq. (b) 


y = 


1 

x 


which upon substitution into Eq. (a) yields x 2 + 1 /x 2 — 3 = 0, or 


x 4 - 3x 2 ± 1 = 0 


The solutions of this biquadratic equation: x = ±0.618 03 and ±1.618 03 agree with 
the results obtained by the Newton-Raphson method. 
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EXAMPLE 4.9 

Find a solution of 

sin x + y 1 2 + \nz — 7 = 0 
3x + 2* - z? + 1 = 0 
x+y+z- 5=0 

using newtonRaphson2. Start with the point (1,1,1). 

Solution Letting x = xi, y = x 2 and z = Xs, the code defining the function array f(x) is 

function y = fex4_9(x) 

% Function used in Example 4.9 
y = [sin(x(l)) + x(2)"2 + log(x(3)) - 7; ... 

3*x(l) + 2~x(2) - x(3)~ 3 + 1; 
x(l) + x(2) + x(3) - 5]; 

The solution can now be obtained with the single command 

» newtonRaphson2(@fex4_9,[1;1;1]) 

which results in 

ans = 

0.5991 
2.3959 
2.0050 

Lienee the solution is x = 0.5991, y = 2.3959 and z = 2.0050. 

PROBLEM SET 4.1 

1. Use the Newton-Raphson method and a four-function calculator (-1— x= oper¬ 
ations only) to compute \/T5 with four significant figure accuracy. 

2. Find the smallest positive (real) root of x 3 4 5 6 — 3.23x 2 — 5.54x + 9.84 = 0 by the 
method of bisection. 

3. The smallest positive, nonzero root ofcoshx cos x — 1 = 0 lies in the interval (4, 5). 
Compute this root by Brent’s method. 

4. Solve Prob. 3 by the Newton-Raphson method. 

5. A root of the equation tan x — tanhx = 0 lies in (7.0, 7.4). Find this root with three 
decimal place accuracy by the method of bisection. 

6. Determine the two roots of sin x + 3 cos x — 2 = 0 that lie in the interval (—2, 2). 
Use the Newton-Raphson method. 
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7. A popular method of hand computation is the secant formula where the improved 
estimate of the root (jq+i) is obtained by linear interpolation based two previous 
estimates (x,- and x;_i): 


X,+i = Xi - 


Xj - X ; -1 
fUt) - /fe-i) 


/UO 


Solve Prob. 6 using the secant formula. 

8. Draw a plot of fix) = cosh x cos x — 1 in the range 4 < x < 8. (a) Verify from the 
plot that the smallest positive, nonzero root of fix) = 0 lies in the interval (4, 5). 
(b) Show graphically that the Newton-Raphson formula would not converge to 
this root if it is started with x = 4. 

9. The equation x 3 — 1.2x 2 — 8.19x + 13.23 = 0 has a double root close to x = 2. De¬ 
termine this root with the Newton-Raphson method within four decimal places. 

10. ■ Write a program that computes all the roots of fix) = 0 in a given interval with 
Brent’s method. Utilize the functions rootsearch and brent. You may use the 
program in Example 4.3 as a model. Test the program by finding the roots of 
xsinx + 3cosx — x = 0 in (—6, 6). 

11. ■ Solve Prob. 10 with the Newton-Raphson method. 

12. ■ Determine all real roots of x 4 + 0.9x 3 — 2.3x 2 + 3.6x — 25.2 = 0. 

13. ■ Compute all positive real roots of x 4 + 2x 3 — 7x 2 + 3 = 0. 

14. ■ Find all positive, nonzero roots of sinx — O.lx = 0. 

15. ■ The natural frequencies of a uniform cantilever beam are related to the roots 
fit of the frequency equation flfi) = cosh fi cos p + 1 = 0, where 


I2irf) 2 


mL 3 

~eT 


f = zth natural frequency (cps) 
m = mass of the beam 
L = length of the beam 
E = modulus of elasticity 
/ = moment of inertia of the cross section 


Determine the lowest two frequencies of a steel beam 0.9 m long, with a rectan¬ 
gular cross section 25 mm wide and 2.5 mm in. high. The mass density of steel is 
7850 kg/m 3 and E = 200 GPa. 
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16. ■ 



A steel cable of length s is suspended as shown in the figure. The maximum tensile 
stress in the cable, which occurs at the supports, is 

U max — U0 COSh ft 

where 

yL 

B = — 

2er o 

a 0 = tensile stress in the cable at O 
y = weight of the cable per unit volume 
L = horizontal span of the cable 

The length to span ratio of the cable is related to f by 

s 1 . , „ 

i =r mbl) 

FindCT max ify = 77 x 10 3 N/m 3 (steel), L = 1000 m and s = 1100 m. 

17. ■ 



The aluminum W310 x 202 (wideflange) column is subjected to an eccentric axial 
load P as shown. The maximum compressive stress in the column is given by the 
so-called secant formula: 


® max — & 


. ec 
1 + -r sec 
r z 



where 


a = P/A = average stress 

A = 25 800 mm 2 = cross-sectional area of the column 
e = 85 mm = eccentricity of the load 
c = 170 mm = half-depth of the column 
r = 142 mm = radius of gyration of the cross section 











168 


Roots of Equations 


L = 7100 mm = length of the column 
E = 71 x 10 9 Pa = modulus of elasticity 


Determine the maximum load P that the column can carry if the maximum stress 
is not to exceed 120 x 10® Pa. 



H 


Bernoulli’s equation for fluid flow in an open channel with a small bump is 



where 


Q = 1.2 m 3 /s = volume rate of flow 
g = 9.81 m/s 2 = gravitational acceleration 
b = 1.8 m = width of channel 
h 0 = 0.6 m = upstream water level 
H = 0.075 m = height of bump 
h = water level above the bump 


Determine h. 

19. ■ The speed v of a Saturn V rocket in vertical flight near the surface of earth can 
be approximated by 



where 


u = 2510 m/s = velocity of exhaust relative to the rocket 
M 0 = 2.8 x 10® kg = mass of rocket at liftoff 
m = 13.3 x 10 3 kg/s = rate of fuel consumption 
g = 9.81m/s 2 = gravitational acceleration 
t = time measured from liftoff 


Determine the time when the rocket reaches the speed of sound (335 m/s). 
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20. ■ 


Heating at 
constant 


Isothermal 

expansion 



The figure shows the thermodynamic cycle of an engine. The efficiency of this 
engine for monatomic gas is 

= in(r 2 /7i) - (i - ii/r 2 ) 

71 ln(r 2 /7i) + (l-7i/r 2 )/(y-l) 

where T is the absolute temperature and y = 5/3. Find T 2 /T\ that results in 30% 
efficiency {rj = 0.3). 

21. ■ Gibb’s free energy of one mole of hydrogen at temperature T is 

G= -RTln[(T/T 0 ) 5/2 ] J 


where R = 8.314 41 J/K is the gas constant and T 0 = 4.44418 K. Determine the 
temperature at which G = —10 5 J. 

22. ■ The chemical equilibrium equation in the production of methanol from CO 
and H 2 is 8 


£(3-2§) 2 
(1 — £) 3 


249.2 


where / is the equilibrium extent of the reaction. Determine 

23. ■ Determine the coordinates of the two points where the circles (x — 2) 2 + y 2 = 4 
and x 2 + (y — 3) 2 = 4 intersect. Start by estimating the locations of the points from 
a sketch of the circles, and then use the Newton-Raphson method to compute 
the coordinates. 


24. ■ The equations 


sin x + 3 cos x - 2 = 0 
cos x — sin y + 0.2 = 0 

have a solution in the vicinity of the point (1,1). Use the Newton-Raphson method 
to refine the solution. 


8 From Alberty, R.A., Physical Chemistry, 7th ed., Wiley, 1987. 
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25. ■ Use any method to find all real solutions in 0 < x < 1.5 of the simultaneous 
equations 


tan x — y = 1 
cosx — 3 siny = 0 

26. ■ The equation of a circle is 

(x - a) 2 + (y - b) 2 = R 2 


where R is the radius and ( a, b) are the coordinates of the center. If the coordinates 
of three points on the circle are 


X 

8.21 

0.34 

5.96 

y 

0.00 

6.62 

-1.12 


determine R, a and b. 


27. ■ 



The trajectory of a satellite orbiting the earth is 



1 + esin(6< + a) 


where (R, 8) are the polar coordinates of the satellite, and C, e and a are constants 
[e is known as the eccentricity of the orbit). If the satellite was observed at the 
following three positions 


8 

-30° 

0° 

00 

O 

if (km) 

6870 

6728 

6615 


determine the smallest R of the trajectory and the corresponding value of 8. 


28. ■ 



x 


A projectile is launched at O with the velocity v at the angle 8 to the horizontal. 
The parametric equation of the trajectory is 

x = [vcos 8)t 

y = -^gt 2 + (I'sin 8)t 
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where t is the time measured from the instant of launch, and g = 9.81 m/s 2 rep¬ 
resents the gravitational acceleration. If the projectile is to hit the target at the 45° 
angle shown in the figure, determine v, 0 and the time of flight. 

29. ■ 



The three angles shown in the figure of the four-bar linkage are related by 

150 cos + 180 cos 02 — 200 cos 0 3 = 200 
150 sin 0i + 180 sin 0 2 — 200 sin 0 3 = 0 

Determine 0i and 0 2 when 0 3 = 75°. Note that there are two solutions. 


*4.7 Zeroes of Polynomials 
Introduction 

A polynomial of degree n has the form 

P n {x) = a\X n + a 2 x n ~ 1 -\ -h a n x + a n+ \ (4.9) 

where the coefficients a,- may be real or complex. We will concentrate on polynomi¬ 
als with real coefficients, but the algorithms presented in this article also work with 
complex coefficients. 

The polynomial equation P n {x) = 0 has exactly n roots, which may be real or 
complex. If the coefficients are real, the complex roots always occur in conjugate pairs 
(x r + ixi, x r — ixi), where x r and x, are the real and imaginary parts, respectively. For 
real coefficients, the number of real roots can be estimated from the rule of Descartes: 

• The number of positive, real roots equals the number of sign changes in the 
expression for P„(x), or less by an even number. 

• The number of negative, real roots is equal to the number of sign changes in 
P„(-jc), or less by an even number. 

As an example, consider P 3 (x) = x 3 — 2x 2 — 8x + 27. Since the sign changes 
twice, P 3 (x) = 0 has either two or zero positive real roots. On the other hand, 
p 3 (—x) = —x 3 — 2x 2 + 8x + 27 contains a single sign change; hence P 3 (x) possesses 
one negative real zero. 
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The real zeroes of polynomials with real coefficients can always be computed by 
one of the methods already described. But if complex roots are to be computed, it is 
best to use a method that specializes in polynomials. Here we present a method due to 
Laguerre, which is reliable and simple to implement. Before proceeding to Laguerre’s 
method, we must first develop two numerical tools that are needed in any method 
capable of determining the zeroes of a polynomial. The first of these is an efficient 
algorithm for evaluating a polynomial and its derivatives. The second algorithm we 
need is for the deflation of a polynomial, i.e., for dividing the P n {x) by x - r, where r is 
a root of P„(x) = 0. 

Evaluation of Polynomials 

It is tempting to evaluate the polynomial in Eq. (4.9) from left to right by the following 
algorithm (we assume that the coefficients are stored in the array a): 

p = 0.0 

for i = l:n+l 

p = p + a(i)*x~(n-i+1) 

end 


Since x k is evaluated as x x x x • • • x x [k — 1 multiplications), we deduce that the 
number of multiplications in this algorithm is 

1 + 2 + 3H-F n— 1 = i«(«— 1) 

If n is large, the number of multiplications can be reduced considerably if we evaluate 
the polynomial from right to left. For an example, take 

P 4 (x) = flix 4 + a 2 x 3 + a 3 x 2 + a 4 x + a 5 

which can be rewritten as 

P 4 (x) = as + x{a 4 + x [a 3 + x ( a 2 + x«i)]} 

We now see that an efficient computational sequence for evaluating the poly¬ 
nomial is 


P 0 (x) = a x 
Pi(x) = a 2 +xP 0 {x) 
P 2 {x ) = a 3 + xPi(x) 
P 3 (x) = a 4 +xP 2 (x) 
P 4 (x) = a 5 + xP 3 (x) 





4.7 Zeroes of Polynomials 

For a polynomial of degree n, the procedure can be summarized as 


PoM = fli 

Pi (x) = a n+i + xPi-i (x), i = l,2,...,n 


(4.10) 


leading to the algorithm 

P = a(l); 
for i = l:n 

p = p*x + a(i+l) 

end 

The last algorithm involves only n multiplications, making it more efficient for 
n > 3. But computational economy is not the prime reason why this algorithm should 
be used. Because the result of each multiplication is rounded off, the procedure with 
the least number of multiplications invariably accumulates the smallest roundoff 
error. 

Some root-finding algorithms, including Laguerre’s method, also require eval¬ 
uation of the first and second derivatives of P„(x). From Eq. (4.10) we obtain by 
differentiation 


P 0 '(x) = 0 P/(x) = P;_,(x) + xP/_,(x), i = l,2,...,n (4.11a) 

P 0 "(x) = 0 P/'(jc) =2P/_ 1 (jc)+JcP i " 1 (jc), z = l,2,...,« (4.11b) 


■ evalPoly 

Here is the function that evaluates a polynomial and its derivatives: 

function [p,dp,ddp] = evalpoly(a,x) 

% Evaluates the polynomial 

% p = a(l)*x"n + a(2)*x~(n-l) + ... + a(n+l) 

% and its first two derivatives dp and ddp. 

% USAGE: [p,dp,ddp] = evalpoly(a,x) 

n = length(a) - 1; 

p=a(l); dp =0.0; ddp = 0.0; 

for i = l:n 

ddp = ddp *x + 2.0*dp; 
dp = dp*x + p; 
p = p*x + a(i+l); 


end 
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Deflation of Polynomials 

After a root r of P n [x ) = 0 has been computed, it is desirable to factor the polynomial 
as follows: 


P n (x) = [x - r) P„_ i (x) (4.12) 

This procedure, known as deflation or synthetic division, involves nothing more than 
computing the coefficients of P n ~ i (x). Since the remaining zeros of P n {x) are also the 
zeros of P n ~i (x), the root-finding procedure can now be applied to P n ~ i (x) rather than 
P n (x). Deflation thus makes it progressively easier to find successive roots, because 
the degree of the polynomial is reduced every time a root is found. Moreover, by 
eliminating the roots that have already been found, the chances of computing the 
same root more than once are eliminated. 

If we let 

P n -iM = bix n ~ l + b 2 x"~ 2 -\ -f b n -ix + b n 

then Eq. (4.12) becomes 

aix n + a 2 x n ~ 1 -\ -h a n x + a n+ \ 

= (x - r)(foix" -1 + b 2 x n ~ 2 -t-h b n -ix + b„) 

Equating the coefficients of like powers of x, we obtain 

bi = ai b 2 = a 2 + rbi ■■■ b n = a n + rb n - 1 (4.13) 

which leads to Horner’s deflation algorithm: 

b(l) = a(l); 
for i = 2:n 

b(i) = a(i) + r*b(i-l); 

end 


Laguerre's Method 

Laguerre’s formulas are not easily derived for a general polynomial P,fx). However, the 
derivation is greatly simplified if we consider the special case where the polynomial 
has a zero at x = r and {n - 1) zeros at x = q. If the zeros were known, this polynomial 
can be written as 

P n {x) = (x - r)(x - q) n ~ l (a) 

Our problem is now this: given the polynomial in Eq. (a) in the form 
P n [x) = a\X n + a 2 x n ~ l -\ -b a n x + a n+ 1 
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determine r (note that q is also unknown). It turns out that the result, which is ex¬ 
act for the special case considered here, works well as an iterative formula with any 
polynomial. 

Differentiating Eq. (a) with respect to x, we get 


P'(x) = {x-q) n 1 + [n- l)(x - r)( x-q) r 


= P n U) 


1 n- 1 

-1- 

x - r x — q 


Thus 


w 

PnW 


1 n— 1 

-1- 

x-r x-q 


which upon differentiation yields 


PnW 


W 

Pn(x) 


PnW 

It is convenient to introduce the notation 

w 


GW " PM 
so that Eqs. (b) and (c) become 


1 n- 1 

(x — r) 1 2 3 4 5 (x — q) 2 

w 

PnW 


H{x ) = G (x) - 


1 n -1 
G(x) —-1- 


H(x) = 


x-r 

1 


x — q 
n— 1 

+ 


(b) 

(c) 

(4.14) 

(4.15a) 

(4.15b) 


(x - r) 2 (x - q) 2 

If we solve Eq. (4.15a) forx— q and substitute the result into Eq. (4.15b), we obtain a 
quadratic equation for x — r. The solution of this equation is the Laguerre’s formula 

n 

x-r = - , (4.16) 


G(x) ± Jin- 1) [nH{x] - G 2 (x)] 

The procedure for finding a zero of a general polynomial by Laguerre’s formula is: 


1. Let x be a guess for the root of P n [x ) = 0 (any value will do). 

2. Evaluate P n ix), P' n (x) and / J "(x) using the procedure outlined in Eqs. (4.10) 
and (4.11). 

3. Compute G(x) and H{x) from Eqs. (4.14). 

4. Determine the improved root r from Eq. (4.16) choosing the sign that results in the 
larger magnitude of the denominator (this can be shown to improve convergence). 

5. Let x «- r and repeat steps 2-5 until | P„(x)| < e or |x — r| < e, where e is the error 
tolerance. 
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Roots of Equations 


One nice property of Laguerre’s method is that converges to a root, with very few 
exceptions, from any starting value of x. 


■ polyRoots 

The function polyRoots in this module computes all the roots of P„{x) = 0, 
where the polynomial P„{x) defined by its coefficient array a = \ci\. a 2 , a :i ,, a n +\]- 
After the first root is computed by the subfunction laguerre, the polynomial is de¬ 
flated using deflPoly and the next zero computed by applying laguerre to the 
deflated polynomial. This process is repeated until all n roots have been found. 
If a computed root has a very small imaginary part, it is very likely that it rep¬ 
resents roundoff error. Therefore, polyRoots replaces a tiny imaginary part by 
zero. 

function root = polyroots(a,tol) 

% Returns all the roots of the polynomial 
% a(l)*x~n + a(2)*x~(n-l) + ... + a(n+l). 

% USAGE: root = polyroots(a,tol). 

% tol = error tolerance (default is 1.0e4*eps). 

if nargin == 1; tol = 1.0e-6; end 
n = length(a) - 1; 
root = zeros(n,l); 
for i = l:n 

x = laguerre(a,tol); 

if abs(imag(x)) < tol; x = real(x); end 
root(i) = x; 
a = deflpolyCa,x); 

end 

function x = laguerre(a,tol) 

% Returns a root of the polynomial 
% a(l)*x~n + a(2)*x~ (n-1) + ... + a(n+l). 
x = randn; % Start with random number 

n = length(a) - 1; 
for i = 1:30 

[p,dp,ddp] = evalpoly(a,x); 
if abs(p) < tol; return; end 
g = dp/p; h = g*g - ddp/p; 
f = sqrt((n - l)*(n*h - g*g)); 
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if abs(g + f) >= abs(g - f); dx = n/(g + f); 
else; dx = n/(g - f); end 
x = x - dx; 

if abs(dx) < tol; return; end 

end 

error('Too many iterations in laguerre’) 

function b = deflpoly(a,r) 

% Horner’s deflation: 

% a(l)*x'n + a(2)*x"(n-1) + ... + a(n+l) 

% = (x - r)[b(l)*x~(n-1) + b(2)*x"(n-2) + ...+ b(n)]. 
n = length(a) - 1; 
b = zeros(n,1); 
b(l) = a(l); 

for i = 2:n; b(i) = a(i) + r*b(i-l); end 

Since the roots are computed with finite accuracy, each deflation introduces small 
errors in the coefficients of the deflated polynomial. The accumulated roundoff error 
increases with the degree of the polynomial and can become severe if the polynomial is 
ill-conditioned (small changes in the coefficients produce large changes in the roots). 
Hence the results should be viewed with caution when dealing with polynomials of 
high degree. 

The errors caused by deflation can be reduced by recomputing each root using 
the original, undeflated polynomial. The roots obtained previously in conjunction 
with deflation are employed as the starting values. 

EXAMPLE 4.10 

A zero of the polynomial P 4 (x) = 3x 4 — 10x 3 — 48x 2 — 2x + 12 is x = 6. Deflate the 
polynomial with Horner’s algorithm, i.e., find P 3 (x) so that (x - 6) P>,{x) = P 4 (x). 

Solution With r = 6 and n= 4, Eqs. (4.13) become 

bi = ai = 3 

— &2 4“ 6 b\ — —10 + 6(3) = 8 
b 3 = a 3 4- 6fo 2 = -48 4- 6(8) = 0 
= CI 4 4~ 6f?3 = —2 4~ 6(0) = —2 


Therefore, 


P 3 (x) = 3x 3 4 - 8x 2 - 2 
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EXAMPLE 4.11 

A root of the equation P 3 (x) = x 3 - 4.(be 2 — 4.48x + 26.1 is approximately x = 3 — i. 
Find a more accurate value of this root by one application of Laguerre’s iterative 
formula. 

Solution Use the given estimate of the root as the starting value. Thus 
x = 3 — i x 2 = 8 — 6/ x 3 = 18 — 26/ 


Substituting these values in P 3 (x) and its derivatives, we get 

P 3 (x) = x 3 - 4.0x 2 -4.48x + 26.1 

= (18 - 26/) - 4.0(8 - 6 Z) - 4.48(3 - /) + 26.1 = -1.34 + 2.48/ 
Pj(x) = 3.Ox 2 - 8 .Ox - 4.48 

= 3.0(8 - 6 Z) - 8.0(3 - i) - 4.48 = -4.48 - 10.0/ 

P 3 (x) = 6 .Ox - 8.0 = 6.0(3 - i) - 8.0 = 10.0 - 6.0/ 


Equations (4.14) then yield 

P^(x) —4.48 — 10.Oz 

P 3 (x) “ -1.34 + 2.48/ 


G(x) = 


-2.36557 + 3.08462/ 


P" fx) 

H{x) = G 2 (x)-- = (-2.36557 + 3.08462/) 2 

P 3 {x) 

= 0.35995 - 12.48452/ 


10.0 - 6 . 0 / 
-1.34 + 2.48/ 


The term under the square root sign of the denominator in Eq. (4.16) becomes 

F{x) = J{n- 1) [nH(x) - G 2 (x)] 

= yjl [3(0.35995 - 12.48452/) - (-2.36557 + 3.08462/) 2 ] 

= V5.67822 - 45.71946/ = 5.08670 - 4.49402/ 

Now we must find which sign in Eq. (4.16) produces the larger magnitude of the 
denominator: 


|G(x) + F(x)| = |(-2.36557 + 3.08462/) + (5.08670 -4.49402/)| 

= 12.72113 - 1.40940/| = 3.06448 

|G(x) - F(x)| = | (-2.36557 + 3.08462/) - (5.08670 - 4.49402/)| 

= 1-7.45227 + 7.57864/1 = 10.62884 

Using the minus sign, we obtain from Eq. (4.16) the following improved approxi¬ 
mation for the root 


G(x) - Fix) -7.45227 + 7.57864/ 

= 3.19790- 0.79875/ 
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Thanks to the good starting value, this approximation is already quite close to the 
exact value r = 3.20 — 0.80k 

EXAMPLE 4.12 

Use polyRoot s to compute all the roots of x * 1 2 3 4 — 5x 3 — 9x 2 + 155x — 250 = 0. 
Solution The command 

» polyroots([l -5 -9 155 -250]) 

results in 


ans = 

2.0000 

4.0000 - 3.00001 
4.0000 + 3.00001 
-5.0000 

There are two real roots (x = 2 and —5) and a pair of complex conjugate roots 
(x = 4 ± 3/). 

PROBLEM SET 4.2 

Problems 1-5 A zero x = r of P„(x) is given. Verify that r is indeed a zero, and then 
deflate the polynomial, i.e., find P n ~ i(x) so that P„(x) = (x — r) P„_i (x). 

1. P 3 (x) = 3x 3 + lx 2 - 36x + 20, r = -5. 

2. P 4 (x) = x 4 - 3x 2 + 3x - 1, r = 1. 

3. P 5 (x) = x 5 6 7 8 9 10 11 - 30x 4 + 361x 3 - 2178x 2 + 6588x - 7992, r = 6. 

4. P 4 (x) = x 4 — 5x 3 — 2x 2 — 20x — 24, r = 2i. 

5. P 3 (x) = 3x 3 - 19x 2 + 45x - 13, r = 3 - 2 i. 

Problems 6-9 A zero x = r of P„(x) is given. Determine all the other zeroes of P„(x) 
by using a calculator. You should need no tools other than deflation and the quadratic 
formula. 

6. P 3 (x) = x 3 + 1.8x 2 - 9.01x - 13.398, r = -3.3. 

7. P 3 (x) = x 3 - 6.64x 2 + 16.84x - 8.32, r = 0.64. 

8. P 3 (x) = 2x 3 - 13x 2 + 32x - 13, r = 3 - 21. 

9. P 4 (x) = x 4 — 3x 3 + 10x 2 — 6x — 20, r = 1 + 3 i. 

Problems 10-16 Find all the zeroes of the given P„{x). 

10. ■ P 4 (x) = x 4 + 2.lx 3 - 2.52x 2 + 2.lx - 3.52. 

11. ■P 5 (x) = x 5 - 156x 4 — 5x 3 + 780x 2 + 4x — 624. 
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12. BP 6 (x) = x 6 + 4x 5 - 8x 4 - 34x 3 + 57x 2 + 130x - 150. 

13. MP 7 (x) = 8x 7 + 28x 6 + 34x 5 - 13x 4 - 124x 3 + 19x 2 + 220x - 100. 

14. ■ P 8 (x) = x 8 - 7x 7 + 7x 6 + 25x 5 + 24x 4 - 98x 3 - 472x 2 + 440x + 800. 

15. mP 4 (x ) = x 4 + (5 + i)x 3 - (8 - 5 i)x 2 + (30 - 14z')x - 84. 

16. ■ 



The two blocks of mass m each are connected by springs and a dashpot. The 
stiffness of each spring is k, and c is the coefficient of damping of the dashpot. 
When the system is displaced and released, the displacement of each block during 
the ensuing motion has the form 

x k {t) = A k e , ° rt cos (w;f + </> fc ), k= 1,2 

where A k and cp k are constants, and o> = o> r ± z&>; are the roots of 

, Co k o c k (k\ 2 

co + 2 —co + 3— co +- co - (- I — J = 0 

m m mm \mj 

Determine the two possible combinations of co r and w; He/rn= 12 s _1 and k/m = 
1500 s —2 . 


MATL AB Functions 

x = fzero(@func ,x0) returns the zero of the function func closest to xO. 
x = fzero(@func , [a b]) can be used when the root has been bracketed in (a, b). 

The algorithm used for f zero is Brent’s method. 


x = root s (a) returns the zeros of the polynomial P n [x) = a 4 x n 4-1- a n x + a n+ 1 . 
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4.7 Zeroes of Polynomials 


The zeros are obtained by calculating the eigenvalues of the n x n “companion 
matrix” 


—a 2 lai —a 3 /a\ ■■■ —a n /a\ —a n+ i/ai 

1 0 ... 0 0 

0 1 0 0 

0 0 ... 1 0 


The characteristic equation (see Art. 9.1) of this matrix is 

n n _i Cln ttn+1 „ 

x -\ - jc -t - -x-\ - =0 

d\ d\ d\ 

which is equivalent to P n {x) = 0. Thus the eigenvalues of A are the zeroes of P n [x). The 
eigenvalue method is robust, but considerably slower than Laguerre’s method. 







Numerical Differentiation 


Given the function fix), compute d n f/dx n at given x 


5.1 Introduction 

Numerical differentiation deals with the following problem: we are given the function 
y = fix) and wish to obtain one of its derivatives at the point x = xt- The term “given” 
means that we either have an algorithm for computing the function, or possess a 
set of discrete data points (jc yd, i = 1, 2,..., n. In either case, we have access to a 
finite number of (jc, y) data pairs from which to compute the derivative. If you suspect 
by now that numerical differentiation is related to interpolation, you are right—one 
means of finding the derivative is to approximate the function locally by a polynomial 
and then differentiate it. An equally effective tool is the Taylor series expansion of 
fix) about the point jc*. The latter has the advantage of providing us with information 
about the error involved in the approximation. 

Numerical differentiation is not a particularly accurate process. It suffers from 
a conflict between roundoff errors (due to limited machine precision) and errors 
inherent in interpolation. For this reason, a derivative of a function can never be 
computed with the same precision as the function itself. 

5.2 Finite Difference Approximations 

The derivation of the finite difference approximations for the derivatives of fix) are 
based on forward and backward Taylor series expansions of fix) about x, such as 

fix + h) = fix) + hf'ix) + J fix) + ^ f"'ix) + ^ / (4 > ix) + ■ ■ ■ (a) 
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5.2 Finite Difference Approximations 


h >2 h 4 

fix - h) = fix) - hf’ix) + — fix) - — fix) + — / t4) (x)- (b) 

fix + 2 h) = fix) + 2 hf'ix) + f" (x) + yy fix) + / (4) (x) + • • • (c) 

fix - 2 h) = fix) - 2 hf'ix) + {2 ^ f" (x) - fix) + ^ / {4) (x)- (d) 

We also record the sums and differences of the series: 

/t 4 

fix + h) + fix - h) = 2 fix) + h 2 fix) + — / (4) (x) H- (e) 

fix + h) - fix - h) = 2 hf'ix) + y fix) H- (f) 

4/j 4 

fix + 2 h) + fix - 2 h) = 2 fix) + 4 h 2 fix) + — / (4) (x) H- (g) 

g^3 

fix + 2 h) - fix - 2 h) = 4 hf'ix) + — fix) H- (h) 

Note that the sums contain only even derivatives, while the differences retain just the 
odd derivatives. Equations (a)-(h) can be viewed as simultaneous equations that can 
be solved for various derivatives of fix). The number of equations involved and the 
number of terms kept in each equation depend on the order of the derivative and the 
desired degree of accuracy. 


First Central Difference Approximations 

The solution of Eq. (f) for fix) is 

r m = /(J; + ff ~ 10 - ^ rw - 

Keeping only the first term on the right-hand side, we have 

rm= nx+h) f x - h) +otf) 


(5.1) 


which is called the first central difference approximation for fix). The term C){h 2 ) 
reminds us that the truncation error behaves as h 2 . 

From Eq. (e) we obtain 


fix) = 


fix + h) - 2 fix) + fix - h) | h 2 


h 2 


+ -f\x) + 


or 


rw , /to+w-2/w + /to-w + 0( r) 


( 5 . 2 ) 














184 


Numerical Differentiation 


Central difference approximations for other derivatives can be obtained from 
Eqs. (a)-(h) in a similar manner. For example, eliminating fix) from Eqs. (f) and (h) 
and solving for f'ix) yield 


f”(x) = ftx + 2h> - 2f{X + 1,1 + 2/lJ - 111 - /|J - 2,,) + O(tf) (5.31 


The approximation 


/ (4) W 


fix + 2 h) - A fix + h) + 6 fix) - A fix — h) + fix - 2 h) 

h 4 


+ Oih 2 ) 


(5.4) 


is available from Eq. (e) and (g) after eliminating fix). Table 5.1 summarizes the 
results. 



fix - 2 h) 

fix - h) 

/« 

fix + h) 

fix +2 h) 

2 hf'ix) 


-1 

0 

1 


h 2 fix) 


1 

-2 

1 


2 f fix) 

-1 

2 

0 

-2 

1 

h 4 f w ix) 

1 

-4 

6 

-4 

1 


Table 5.1. Coefficients of central finite difference approximations 
of Oih 2 ) 


First Noncentral Finite Difference Approximations 


Central finite difference approximations are not always usable. For example, consider 
the situation where the function is given at the n discrete points jci, x 2 ,..., x n . Since 
central differences use values of the function on each side of x, we would be unable to 
compute the derivatives at Xi and x n . Clearly, there is a need for finite difference 
expressions that require evaluations of the function only on one side of x. These 
expressions are called forward and backward finite difference approximations. 

Noncentral finite differences can also be obtained from Eqs. (a)-(h). Solving 
Eq. (a) for fix) we get 


fx) 


fix + h) - fix) 




Keeping only the first term on the right-hand side leads to the first forward difference 
approximation 


/'M 


fix + h) - fix) 
h 


+ Oih) 


(5.5) 


Similarly, Eq. (b) yields the first backward difference approximation 


fx) 


fix) - fix - h ) 


+ Oih) 


h 


(5.6) 
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5.2 Finite Difference Approximations 


Note that the truncation error is nowO(/t), which is not as good as the Oih 2 ) error in 
central difference approximations. 

We can derive the approximations for higher derivatives in the same manner. For 
example, Eqs. (a) and (c) yield 

„„ fix + 2h) — 2 f{x + h) + fix) 

fix) = f + Oih) (5.7) 

h* 

The third and fourth derivatives can be derived in a similar fashion. The results are 
shown in Tables 5.2a and 5.2b. 



fix.) 

fix + h) 

fix + 2 h) 

fix + 3 h) 

fix + 4 h) 

hf'ix) 

-1 

1 




h 2 fix) 

1 

-2 

1 



h 3 fix) 

-1 

3 

-3 

1 


h A fix) 

1 

-4 

6 

-4 

1 


Table 5.2a. Coefficients of forward finite difference approximations 
of Oih) 



fix - 4/z) 

fix - 3 h) 

fix - 2 h) 

fix - h) 

fix) 

hf'ix) 




-1 

1 

h 2 fix) 



1 

-2 

1 

h 3 f"ix) 


-1 

3 

-3 

1 

h A fix) 

1 

-4 

6 

-4 

1 


Table 5.2b. Coefficients of backward finite difference approximations 
of 0(h) 


Second Noncentral Finite Difference Approximations 

Finite difference approximations of Oih) are not popular due to reasons that will be 
explained shortly. The common practice is to use expressions of Oih 2 ). To obtain 
noncentral difference formulas of this order, we have to retain more terms in the 
Taylor series. As an illustration, we will derive the expression for fix). We start with 
Eqs. (a) and (c), which are 

h 2 h ^ ' 

fix + h) = fix) + hf'ix) + — fix) + — fix) + — f (x) + • • • 

Z b Z4 

Ah 3 ? h 4 

fix + 2 h) = fix) + 2 hf'ix) + 2 h 2 f" (x) + — fix) + —f\x) 4- 
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We eliminate fix) by multiplying the first equation by 4 and subtracting it from the 
second equation. The result is 

2 h 2 

f{x + 2 h) - Af{x + h) = -3fix) - 2hf'ix) + — fix) A - 

Therefore, 


fix) 


-fix + 2h) +4 fix +h)~ 3 fix) i h 2 i 

o u + Q J w + 


or 


m -/^ + ^ + 4/tt + w- 3 /te) + 0(tf) (58) 

Equation (5.8) is called the second forward finite difference approximation. 

Derivation of finite difference approximations for higher derivatives involve 
additional Taylor series. Thus the forward difference approximation for fix) utilizes 
series for fix+h), fix + 2h) and fix + 3/?); the approximation for fix) involves 
Taylor expansions for fix + h), fix + 2 h), fix + 3 h) and fix + Ah), etc. As you can see, 
the computations for high-order derivatives can become rather tedious. The results 
for both the forward and backward finite differences are summarized in Tables 5.3a 
and 5.3b. 



fix.) 

fix + h) 

fix + 2 h) 

fix + 3 h) 

fix + 4 h) 

fix + 5 h) 

2 hf'ix) 

-3 

4 

-1 




h 2 fix) 

2 

-5 

4 

-1 



2f fix) 

-5 

18 

-24 

14 

-3 


ffix) 

3 

-14 

26 

-24 

11 

-2 


Table 5.3a. Coefficients of forward finite difference approximations of Oih 2 ) 



fix — 5 h) 

fix — 4/z) 

fix - 3 h) 

fix — 2 h) 

fix - h) 

fix) 

2 hf'ix) 




1 

-4 

3 

h 2 fix) 



-1 

4 

-5 

2 

2 f fix) 


3 

-14 

24 

-18 

5 

h 4 fix) 

-2 

11 

-24 

26 

-14 

3 


Table 5.3b. Coefficients of backward finite difference approximations of Oih 2 ) 
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5.2 Finite Difference Approximations 


Errors in Finite Difference Approximations 

Observe that in all finite difference expressions the sum of the coefficients is zero. 
The effect on the roundoff error can be profound. If h is very small, the values of /( x), 
f{x± h), f(x ± 2h). etc. will be approximately equal. When they are multiplied by the 
coefficients in the finite difference formulas and added, several significant figures can 
be lost. On the other hand, we cannot make h too large, because then the truncation 
error would become excessive. This unfortunate situation has no remedy, but we can 
obtain some relief by taking the following precautions: 

• Use double-precision arithmetic. 

• Employ finite difference formulas that are accurate to at least Oih 2 ). 

To illustrate the errors, let us compute the second derivative of fix) = e~ x at x = 1 
from the central difference formula, Eq. (5.2). We carry out the calculations with six- 
and eight-digit precision, using different values of h. The results, shown in Table 5.4, 
should be compared with /"(1) = e~ x = 0.367 879 44. 


h 

6-digit precision 

8-digit precision 

0.64 

0.380 610 

0.380 60911 

0.32 

0.371035 

0.37102939 

0.16 

0.368 711 

0.368 664 84 

0.08 

0.368 281 

0.368 076 56 

0.04 

0.368 75 

0.367 83125 

0.02 

0.37 

0.3679 

0.01 

0.38 

0.3679 

0.005 

0.40 

0.3676 

0.0025 

0.48 

0.3680 

0.00125 

1.28 

0.3712 


Table 5.4. (e x )" at x = 1 from central finite difference 
approximation 


In the six-digit computations, the optimal value of h is 0.08, yielding a result 
accurate to three significant figures. Hence three significant figures are lost due to 
a combination of truncation and roundoff errors. Above optimal h, the dominant 
error is due to truncation; below it, the roundoff error becomes pronounced. The 
best result obtained with the eight-digit computation is accurate to four significant 
figures. Because the extra precision decreases the roundoff error, the optimal h is 
smaller (about 0.02) than in the six-figure calculations. 
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5.3 Richardson Extrapolation 

Richardson extrapolation is a simple method for boosting the accuracy of certain 
numerical procedures, including finite difference approximations (we will also use it 
later in numerical integration). 

Suppose that we have an approximate means of computing some quantity G. 
Moreover, assume that the result depends on a parameter h. Denoting the approxi¬ 
mation by g(/z), we have G = g{h) + E(h), where E{h) represents the error. Richardson 
extrapolation can remove the error, provided that it has the form E[h) = ch p , c and p 
being constants. We start by computing g(h) with some value of h, say h = hi. In that 
case we have 


G = g{h i) + c/zf (i) 

Then we repeat the calculation with h = h 2 , so that 

G = g{h 2 ) + ch p 2 (j) 


Eliminating c and solving for G, we obtain from Eqs. (i) and (j) 

G _ Vh/h 2 ) p g(h 2 ) - g(/-zi) 
0n/h 2 )P- 1 


(5.9a) 


which is the Richardson extrapolation formula. It is common practice to use h 2 = h\/2, 
in which case Eq. (5.9a) becomes 


2Pg(hi/2) - g(fti) 
2 P- 1 


(5.9b) 


Let us illustrate Richardson extrapolation by applying it to the finite difference 
approximation of ( e ~ x )" at x = 1. We work with six-digit precision and utilize the 
results in Table 5.4. Since the extrapolation works only on the truncation error, we 
must confine h to values that produce negligible roundoff. Choosing hi = 0.64 and 
letting g{li) be the approximation of /"(1) obtained with h, we get from Table 5.4 


g(fti) = 0.380 610 g[hi/2) = 0.371035 


The truncation error in the central difference approximation is E[h) = 0{h 2 ) = c \ h 2 + 

c 2 h 4 + c 2 h & -\ -. Therefore, we can eliminate the first (dominant) error term if we 

substitute p = 2 and hi = 0.64 in Eq. (5.9b). The result is 


G 2 2 g(0.32) - g(0.64) 


4(0.371035) - 0.380 610 
3 


0.367 84 3 


which is an approximation of ( e x )" with the error 0{h A ). Note that it is as accurate as 
the best result obtained with eight-digit computations in Table 5.4. 
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5.3 Richardson Extrapolation 


EXAMPLE 5.1 

Given the evenly spaced data points 


X 

0 

0.1 

0.2 

0.3 

0.4 

fix ) 

0.0000 

0.0819 

0.1341 

0.1646 

0.1797 


compute /'(jc) and f"{x) at x = 0 and 0.2 using finite difference approximations of 
0 (h 2 ). 

Solution From the forward difference formulas in Table 5.3a we get 

= —3/(0) + 4/(0.1) — /(0.2) = -3(0)+ 4(0.0819)-0.1341 = 

7 1 ’ 2 ( 0 . 1 ) 0.2 


2/(0) - 5/(0.1) + 4/(0.2) - /(0.3) 

7 1 J ( 0 . 1) 2 

_ 2(0) - 5(0.0819) + 4(0.1341) - 0.1646 _ 

~ (OTP = “ 3 ‘ 77 

The central difference approximations in Table 5.1 yield 

(0.2) = - /(0 - 1) + /( °- 3) = -°° 819 +0 1646 = 0.4135 
3 2 ( 0 . 1 ) 0.2 

/(0.1) - 2/(0.2) + /(0.3) 0.0819 - 2(0.1341) + 0.1646 

7 1 J ( 0 . 1) 2 “ ( 0 . 1) 2 


EXAMPLE 5.2 

Use the data in Example 5.1 to compute /'(0) as accurately as you can. 


Solution One solution is to apply Richardson extrapolation to finite difference ap¬ 
proximations. We start with two forward difference approximations for /'(0): one 
using h = 0.2 and the other one using h = 0.1. Referring to the formulas of 0{h 2 ) in 
Table 5.3a, we get 


g(0.2) 

g(0.1) 


—3/(0) + 4/(0.2) — /(0.4) 
2 ( 0 . 2 ) 


3(0) +4(0.1341) - 0.1797 
0~4 


0.8918 


—3/(0) + 4/(0.1) — /(0.2) 
2 ( 0 . 1 ) 


-3(0)+ 4(0.0819)-0.1341 
02 


0.9675 


where g denotes the finite difference approximation of /'(0). Recalling that the error 

in both approximations is of the form E(h) = C \ h 2 + c 2 h 4 + c ^ h 6 -\ -, we can use 

Richardson extrapolation to eliminate the dominant error term. With p = 2 we obtain 
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from Eq. (5.9) 

/'CO) - G = 22 g(°-l)-g{°- 2) = 4(0-9675) - 0.8918 = Q ^ 
2 2 — 1 3 

which is a finite difference approximation of 0(/i 4 ). 

EXAMPLE 5.3 



The linkage shown has the dimensions a = 100 mm, b = 120 mm, c = 150 mm 
and d = 180 mm. It can be shown by geometry that the relationship between the 
angles a and p is 

0 d— acosa — focos/J) 2 + (ashler + hsin/J) 2 — c 2 = 0 

For a given value of a, we can solve this transcendental equation for p by one of the 

root-finding methods in Chapter 4. This was done with a = 0°, 5°, 10°.30°, the 

results being 


a (deg) 

0 

5 

10 

15 

20 

25 

30 

P (rad) 

1.6595 

1.5434 

1.4186 

1.2925 

1.1712 

1.0585 

0.9561 


If link AB rotates with the constant angular velocity of 25 rad/s, use finite difference 
approximations of 0{h 2 ) to tabulate the angular velocity dp/dt of link BC against a. 


Solution The angular speed of BC is 


dp dp da 
dt da dt 


dp ,, 
25—— rad/s 
da 


where dp/da is computed from finite difference approximations using the data in the 
table. Forward and backward differences of 0{h 2 ) are used at the endpoints, central 
differences elsewhere. Note that the increment of a is 


h = 


The computations yield 


(5 deg) ^yj^rad/degj = 0.087266 rad 


imo . oc ~3pm + 4/1(5°) - 0(10°) oc -3(1.6595) + 4(1.5434) - 1.4186 

p(0 ) = 25- — - = 25- 

2 h 


= —32.01 rad/s 


2 (0.087266) 
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Pi5°) = 25 


0 ( 10 °) - £( 0 ° 


2 h 


= 25- 


.1.4186- 1.6595 
2(0.087266) 


etc. 


-34.51 rad/s 


The complete set of results is 


ot (deg) 

0 

5 

10 

15 

20 

25 

30 

P (rad/s) 

-32.01 

-34.51 

-35.94 

-35.44 

-33.52 

-30.81 

-27.86 


5.4 Derivatives by Interpolation 

If fix ) is given as a set of discrete data points, interpolation can be a very effective 
means of computing its derivatives. The idea is to approximate the derivative of fix ) 
by the derivative of the interpolant. This method is particularly useful if the data points 
are located at uneven intervals of x, when the finite difference approximations listed 
in the last article are not applicable. 9 

Polynomial Interpolant 

The idea here is simple: fit the polynomial of degree n — 1 

P n -1 (x) = flix" -1 + a 2 x n ~ 2 -b a n (a) 

through n data points and then evaluate its derivatives at the given x. As pointed out 
in Art. 3.2, it is generally advisable to limit the degree of the polynomial to less than 
six in order to avoid spurious oscillations of the interpolant. Since these oscillations 
are magnified with each differentiation, their effect can be devastating. In view of the 
above limitation, the interpolation should usually be a local one, involving no more 
than a few nearest-neighbor data points. 

For evenly spaced data points, polynomial interpolation and finite difference 
approximations produce identical results. In fact, the finite difference formulas are 
equivalent to polynomial interpolation. 

Several methods of polynomial interpolation were introduced in Art. 3.2. Unfor¬ 
tunately, none of them is suited for the computation of derivatives. The method that 
we need is one that determines the coefficients a,\, a 2 ,..., a„ of the polynomial in 
Eq. (a). There is only one such method discussed in Chapter 3—the least-squares fit. 
Although this method is designed mainly for smoothing of data, it will carry out inter¬ 
polation if we use m = n in Eq. (3.22). If the data contains noise, then the least-squares 
fit should be used in the smoothing mode, that is, with m<n. After the coefficients of 

9 It is possible to derive finite difference approximations for unevenly spaced data, but they would 
not be as accurate as the formulas derived in Art. 5.2. 
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the polynomial have been found, the polynomial and its first two derivatives can be 
evaluated efficiently by the function e valpoly listed in Art. 4.7. 


Cubic Spline Interpolant 


Due to its stiffness, cubic spline is a good global interpolant; moreover, it is easy to 
differentiate. The first step is to determine the second derivatives kj of the spline at 
the knots by solving Eqs. (3.12). This can be done with the function splineCurv as 
explained in Art. 3.3. The first and second derivatives are then computed from 


fii +iM = 


k pU-JCj+i ) 2 
6 [_ Xi-xt+i 


- (.Xi - x i+ 1 ) 


kj+i 

6 


3{x- Xj) 2 
Xi - x i+l 


- {Xi - Xi+i) 


+ 


yt - yi +1 

Xi - X i+ i 


f^(x)=hi X : +| -k i+l x Xi 


Xi - Xi+1 


Xi - X i+ 1 


(5.10) 

(5.11) 


which are obtained by differentiation of Eq. (3.10). 


EXAMPLE 5.4 

Given the data 


X 

1.5 

1.9 

2.1 

2.4 

2.6 

3.1 

fix) 

1.0628 

1.3961 

1.5432 

1.7349 

1.8423 

2.0397 


compute /'(2) and /"(2) using (1) polynomial interpolation over three nearest- 
neighbor points, and (2) natural cubic spline interpolant spanning all the data points. 

Solution of Part (1) Let the interpolant passing through the points at x = 1.9,2.1 and 
2.4beP 2 W = fli + a 2 x + a 3 x 2 . The normal equations, Eqs. (3.23), ofthe least-squares 
fit are 


n Y,Xi Y, x f 


Cl\ 


Yyi 

Yxi Y x f Exp 


G<2 

= 

YytXi 

Yxf Yxf Yxf_ 


a-i 


Yytxf _ 


After substituting the data, we get 


3 6.4 13.78 


Cl\ 


’ 4.6742 “ 

6.4 13.78 29.944 


Cl2 

= 

10.0571 

13.78 29.944 65.6578 


r?3 


21.8385 
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which yields a = ^ -0.7714 1.5075 —0.1930 J . Thus the interpolant and its deri¬ 

vatives are 

P 2 {x) = —0.1903x 2 + 1.5075X- 0.7714 
P 2 {x) = -0.3860X+ 1.5075 
P 2 (x) = -0.3860 

which gives us 

/'(2) « P 2 {2) = -0.3860(2) + 1.5075 = 0. 7355 
/"(.2) « P£(2) = -0. 3860 

Solution of Part (2) We must first determine the second derivatives lq of the spline 
at its knots, after which the derivatives of /(x) can be computed from Eqs. (5.10) and 
(5.11). The first part can be carried out by the following small program: 

% Example 5.4 (Curvatures of cubic spline at the knots) 
xData = [1.5; 1.9; 2.1; 2.4; 2.6; 3.1]; 

yData = [1.0628; 1.3961; 1.5432; 1.7349; 1.8423; 2.0397]; 
k = splineCurv(xData,yData) 

The output of the program, consisting of Iq to fc,, is 

» k = 

0 

-0.4258 

-0.3774 

-0.3880 

-0.5540 

0 

Since x = 2 lies between knots 2 and 3, we must use Eqs. (5.10) and (5.11) with 
i = 2. This yields 


/'( 2 ) 


f 2 ,3^ 


k 2 r3(x-x s ) 2 
6 * 2 - x 3 


k 3 I"3(x — x 2 ) 2 
6 x 2 -x 3 


(x 2 


- ( X! 
X 3 ) 


- X 3 ) 

+ y 2 -y 3 

X 2 - X 3 
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_ (-0.4258) 

~ 6 

(-0.3774) 

6 

= 0.7351 


3(2 - 2.1) 2 


(- 0 . 2 ) 

3(2 - 1.9) 2 
(- 0 . 2 ) 


- (- 0 . 2 ) 


- (- 0 . 2 ) 


+ 


1.3961 - 1.5432 


(- 0 . 2 ) 


f" (2) 


# 3 ( 2 ) = *2 


X- x 3 

*2 - X 3 


X~X 2 

X 2 ~X 3 


(-0.4258) 


2 - 2.1 

(- 0 . 2 ) 


(-0.3774) 


2- 1.9 
(- 0 . 2 ) 


-0.4016 


Note that the solutions for /'(2) in parts (1) and (2) differ only in the fourth significant 
figure, but the values of /"(2) are much farther apart. This is not unexpected, consid¬ 
ering the general rule: the higher the order of the derivative, the lower the precision 
with which it can be computed. It is impossible to tell which of the two results is 
better without knowing the expression for fix). In this particular problem, the data 
points fall on the curve fix) = x 2 e~ x,z , so that the “correct” values of the derivatives 
are /'(2) = 0.7358 and /"(2) = -0.3679. 


EXAMPLE 5.5 

Determine /'(0) and /'(1) from the following noisy data 


X 

0 

0.2 

0.4 

0.6 

fix) 

1.9934 

2.1465 

2.2129 

2.1790 

X 

0.8 

1.0 

1.2 

1.4 

fix) 

2.0683 

1.9448 

1.7655 

1.5891 


Solution We used the program listed in Example 3.10 to find the best polynomial fit 
(in the least-squares sense) to the data. The results were: 

degree of polynomial = 2 
coeff = 

- 7.0240e-001 
6.4704e-001 
2.0262e+000 
sigma = 

3.609 7e-002 


degree of polynomial = 3 
coeff = 


4.0521e-001 




























5.4 Derivatives by Interpolation 

-1.5533e+000 
1.0928e+000 
1.9921e+000 
sigma = 

8.2604e-003 

degree of polynomial = 4 
coeff = 

-1.5329e-002 
4.4813e-001 
-1.5906e+000 
1.1028e+000 
1.9919e+000 
sigma = 

9.5193e-003 

degree of polynomial = 

Done 

Based on standard deviation, the cubic seems to be the best candidate for the 
interpolant. Before accepting the result, we compare the plots of the data points and 
the interpolant—see the figure. The fit does appear to be satisfactory. 


2.3 


f(x) 



1.5 


0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 


X 


Approximating /(x) by the interpolant, we have 

/(x) ss flix 3 + a 2 x 2 + a 3 x + a 4 


so that 


/'(x) ks 3«iX 2 + 2a 2 x + a 3 
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Therefore, 


/'(0) ^ a 3 = 1.093 

/'(1) = 3ai + 2 a 2 + a 3 = 3(0.405) + 2(-1.553) + 1.093 = -0.798 

In general, derivatives obtained from noisy data are at best rough approximations. 
In this problem, the data represent /(x) = (x + 2) / cosh x with added random noise. 
Thus /' (x) = [ 1 — (x + 2) tanh x] / cosh x, so that the “correct” derivatives are f (0) = 
1.000 and /'(l) = -0.833. 

PROBLEM SET 5.1 

1. Given the values of /(x) at the points x, x- h\ and x + h 2 , determine the finite 
difference approximation for /"(x). What is the order of the truncation error? 

2. Given the first backward finite difference approximations for /'(x) and /"(x), 
derive the first backward finite difference approximation for using the op¬ 
eration /'"(x) = [/"(x)]\ 

3. Derive the central difference approximation for f"[x) accurate to Of/? 1 ) by apply¬ 
ing Richardson extrapolation to the central difference approximation of 0{h 2 ). 

4. Derive the second forward finite difference approximation for /"'(x) from the 
Taylor series. 

5. Derive the first central difference approximation for / (4) (x) from the Taylor series. 

6. Use finite difference approximations of C>(h 2 ) to compute /'(2.36) and /"(2.36) 
from the data 


X 

2.36 

2.37 

2.38 

2.39 

fix) 

0.85866 

0.86289 

0.86710 

0.87129 


7. Estimate /'(1) and f"( 1) from the following data: 


X 

0.97 

1.00 

1.05 

fix ) 

0.85040 

0.84147 

0.82612 


8. Given the data 


X 

0.84 

0.92 

1.00 

1.08 

1.16 

fix) 

0.431711 

0.398519 

0.367879 

0.339596 

0.313486 


calculate /"(1) as accurately as you can. 
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9. Use the data in the table to compute /'(0.2) as accurately as possible. 


X 

0 

0.1 

0.2 

0.3 

0.4 

fix) 

0.000 000 

0.078 348 

0.138910 

0.192916 

0.244 981 


10. Using five significant figures in the computations, determine d(sin x)jdx at x = 
0.8 from (a) the first forward difference approximation, and (b) the first central 
difference approximation. In each case, use h that gives the most accurate result 
(this requires experimentation). 

11. ■ Use polynomial interpolation to compute /' and /" at x = 0, using the data 


X 

-2.2 

-0.3 

0.8 

1.9 

fix) 

15.180 

10.962 

1.920 

-2.040 


12. ■ 



The crank AB of length R = 90 mm is rotating at the constant angular speed of 
do/dt = 5000 rev/min. The position of the piston C can be shown to vary with the 
angle 6 as 

x = R ^cos6» + y2.5 2 - sin 2 8 ^ 

Write a program that computes the acceleration of the piston at 0 = 
0°, 5°, 10°,..., 180° by numerical differentiation. 
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The radar stations A and B, separated by the distance a = 500 m, track the plane 
C by recording the angles a and p at one-second intervals. If three successive 
readings are 


t (s) 

9 

10 

11 

a 

54.80° 

54.06° 

53.34° 

p 

65.59° 

64.59° 

63.62° 


calculate the speed v of the plane and the climb angle y at t = 10 s. The coordi¬ 
nates of the plane can be shown to be 


tan B tan a tan B 

x = a - V = a - 

tan p — tana tan f J > — tana 


14. ■ 



Geometric analysis of the linkage shown resulted in the following table relating 
the angles 0 and p: 


9 (deg) 

0 

30 

60 

90 

120 

150 

P (deg) 

59.96 

56.42 

44.10 

25.72 

-0.27 

-34.29 


Assuming that member AB of the linkage rotates with the constant angular ve¬ 
locity d6/dt = 1 rad/s, compute dfi/dl in rad/s at the tabulated values of 9. Use 
cubic spline interpolation. 

MATLAB Functions 

d = diff(y) returns the differences d(i) = y(i+l) - y(i). Note that 
length(d) = length(y) - 1. 
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dn = diff(y,n) returns the nth differences; e.g., d2(i) = d(i+l) - d(i), 
d3(i) = d2(i+l) - d2(i), etc. Here length(dn) = length(y) - n. 
d = gradient (y, h) returns the finite difference approximation of dy/dx at each 
point, where h is the spacing between the points. 

d2 = del2 (y, h) returns the finite difference approximation of (d 2 y/dx 2 )/4 at each 
point, where h is the spacing between the points. 
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Numerical Integration 


Compute / fl b fix) dx, where fix) is a given function 


6.1 


Introduction 

Numerical integration, also known as quadrature, is intrinsically a much more accu¬ 
rate procedure than numerical differentiation. Quadrature approximates the definite 
integral 





by the sum 


n 


1 = Yl Af(Xi) 


where the nodal abscissas jc/ and weights A depend on the particular rule used for the 
quadrature. All rules of quadrature are derived from polynomial interpolation of the 
integrand. Therefore, they work best if fix) can be approximated by a polynomial. 

Methods of numerical integration can be divided into two groups: Newton-Cotes 
formulas and Gaussian quadrature. Newton-Cotes formulas are characterized by 
equally spaced abscissas, and include well-known methods such as the trapezoidal 
rule and Simpson’s rule. They are most useful if fix) has already been computed at 
equal intervals, or can be computed at low cost. Since Newton-Cotes formulas are 
based on local interpolation, they require only a piecewise fit to a polynomial. 

In Gaussian quadrature the locations of the abscissas are chosen to yield the best 
possible accuracy. Because Gaussian quadrature requires fewer evaluations of the 
integrand for a given level of precision, it is popular in cases where fix) is expensive to 


6.1 
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6.2 Newton-Cotes Formulas 


evaluate. Another advantage of Gaussian quadrature is its ability to handle integrable 
singularities, enabling us to evaluate expressions such as 

dx 


/ 


g M 




provided that g(x) is a well-behaved function. 


6.2 Newton-Cotes Formulas 



Consider the definite integral 


/' 


fix) dx 


( 6 . 1 ) 


We divide the range of integration (a. b ) into n— 1 equal intervals of length h = 
[b — a)/in — 1) each, as shown in Fig. 6.1, and denote the abscissas of the resulting 
nodes by .q , x 2 ,, x n . Next we approximate fix) by a polynomial of degree n — 1 that 
intersects all the nodes. Lagrange’s form of this polynomial, Eq. (3.1a), is 

n 

Pn-lix) = ^ fiXiUiix) 
i=l 

where Ifx) are the cardinal functions defined in Eq. (3.1b). Therefore, an approxima¬ 
tion to the integral in Eq. (6.1) is 


I = 


[ P n -\ix)dx = Y] /U) f ti(x)dx ='22 A t fix, 
Ja i=1 L Ja J !=1 


(6.2a) 


where 


,6 

Ai = / liix)dx, i = l,2,...,n 


(6.2b) 


Equations (6.2) are the Newton-Cotes formulas. Classical examples of these formu¬ 
las are the trapezoidal rule in= 2), Simpson’s rule in= 3) and Simpson’s 3/8 rule 
in = 4). The most important of these is the trapezoidal rule. It can be combined with 
Richardson extrapolation into an efficient algorithm known as Romberg integration, 
which makes the other classical rules somewhat redundant. 
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Trapezoidal Rule 



Area=/ 


Figure 6.2. Trapezoidal rule. 


x 


If n = 2 , we have t\ = (x - x 2 )/(x\ — x 2 ) = — (x — b)/h. Therefore, 

a ' = -\L <*- b )‘ i *=Th a - af = l 

Also t 2 = (x — xi)/{x 2 - X \) = [x- a)/h, so that 

1 f b 1 h 

A 2 = t / [x - a) dx = — {b - a) 2 = - 

h J a 2 ft 2 


Substitution in Eq. (6.2a) yields 

I = [f [a] + /(h)] ^ (6.3) 

which is known as the trapezoidal rule. It represents the area of the trapezoid in Fig. 6.2. 
The error in the trapezoidal rule 


E = 



- I 


is the area of the region between fix) and the straight-line interpolant, as indicated 
in Fig. 6.2. It can be obtained by integrating the interpolation error in Eq. (4.3): 


E=±f [x-xi){x-x 2 )f"[$)dx= ^/"(?)J 


Cx — a)[x — b)dx 


= -^(h-a) 3 /"(§) 


hf 

— -f"V0 
12 J s 


(6.4) 


Composite Trapezoidal Rule 
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In practice the trapezoidal rule is applied in a piecewise fashion. Figure 6.3 shows 
the region ( a , b) divided into n — 1 panels, each of width h. The function f{x) to be 
integrated is approximated by a straight line in each panel. From the trapezoidal rule 
we obtain for the approximate area of a typical (z'th) panel 

h = l/Ud + /(*;+1)] ^ 

Hence total area, representing j ^ f(x) dx, is 

I = J^Ii= I/te) + 2 /te) + 2 /(x 3 ) + • • ■ + 2 fUn-i) + f(x n )) ^ (6.5) 

which is the composite trapezoidal rule. 

The truncation error in the area of a panel is from Eq. (6.4), 

h 3 

E ‘ = -T2™ 

where f ,■ lies in (.y, x,-+i). Hence the truncation error in Eq. (6.5) is 

n -1 /,3 «-i 

z=i i=i 

But 

£/"(*«) = («-!)/" 

7=1 


where /" is the arithmetic mean of the second derivatives. If f" (x) is continuous, there 
must be a point f in (a, b) at which /"(f) = /", enabling us to write 

E /"^ = («-D /"(f) = ^/"(f) 

Z=1 


Therefore, Eq. (a) becomes 


£ 


{b— a)h 2 
12 


/"(f) 


( 6 . 6 ) 


It would be incorrect to conclude from Eq. ( 6 . 6 ) that £ = c/i 2 (c being a constant), 
because /"(f) is not entirely independent of h. A deeper analysis of the error 10 shows 
that if /(x) and its derivatives are finite in ( a , ti), then 


£ = c\h 2 + c 2 h 4 + C 3 / 2 6 d- 


(6.7) 


10 The analysis requires familiarity with the Euler-Maclaurin summation formula, which is covered 
in advanced texts. 
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Recursive Trapezoidal Rule 

Let 4 be the integral evaluated with the composite trapezoidal rule using 2 k ~ l panels. 
Note that if k is increased by one, the number of panels is doubled. Using the notation 

H =b — a 


we obtain from Eq. (6.5) the following results for k = 1,2 and 3. 


k= 1 (1 panel): 


h = If (a) + f{b)] 


H 


k= 2 (2 panels): 

h = 
k= 3 (4 panels): 


f[a) + 2f(a+ — ) + f[b) 


H 1 / H\H 

— — —4 4“ f [ a -\~ — ) — 
4 2 J V 2 2 


h = 


H\ J H 


4 


f (n) + 2 4 ( ^ H —— \ -\-2fla -\—— \ 2 f l a 


3 H 


+ m 


H 

¥ 


= 2 /2 + 


H 


f \ a+ T + f\ a + ur 


3 H 


H 

T 


We can now see that for arbitrary k > 1 we have 


r 1 r H ^ f 

k ~ o 1 Ofc-l / V t 


i= 1 




{2i - 1 )H' 
2 k ~ } 


, k=2,3 ,... 


( 6 . 8 ) 


(6.9a) 


which is the recursive trapezoidal rule. Observe that the summation contains only 
the new nodes that were created when the number of panels was doubled. Therefore, 
the computation of the sequence I\, I 2 , 13 ,..., Ik from Eqs. (6.8) and (6.9) involves the 
same amount of algebra as the calculation of 4 directly from Eq. (6.5). The advantage 
of using the recursive trapezoidal rule is that it allows us to monitor convergence and 
terminate the process when the difference between 4_ 1 and 4 becomes sufficiently 
small. A form of Eq. (6.9a) that is easier to remember is 

Uh) = J im + hJ2 fiX new) (6.9b) 

where h = H/[n- 1) is the width of each panel. 


■ trapezoid 

The function trapezoid computes I(li), given I(2li) from Eqs. (6.8) and (6.9). We 
can compute j ^ /( x) dx by calling trapezoid repeatedly with k = 1 , 2 ,... until the 
desired precision is attained. 
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function Ih = trapezoid(func,a,b,I2h,k) 

% Recursive trapezoidal rule. 

% USAGE: Ih = trapezoid(func,a,b,I2h,k) 

% func = handle of function being integrated. 

% a,b = limits of integration. 

% I2h = integral with 2~(k-l) panels. 

% Ih = integral with 2“k panels. 

if k == 1 

fa = feval(func,a); fb = feval(func,b); 

Ih = (fa + fb)*(b - a)/2.0; 

else 

n=2~(k-2); % Number of new points 

h = (b - a)/n ; % Spacing of new points 

x = a + h/2.0; % Coord, of 1st new point 

sum = 0.0; 
for i = l:n 

fx = feval(func,x); 
sum = sum + fx; 
x = x + h; 

end 

Ih = (I2h + h*sum)/2.0; 

end 


Simpson's Rules 



Figure 6.4. Simpson’s 1 /3 rule. 


Simpson’s 1 /3 rule can be obtained from Newton-Cotes formulas with n= 3; 
that is, by passing a parabolic interpolant through three adjacent nodes, as shown in 
Fig. 6.4.Theareaundertheparabola,whichrepresentsanapproximationof f[x) dx, 
is (see derivation in Example 6.1) 


I = 


f{d) + /® 


h 

3 


(a) 
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Figure 6.5. Composite Simpson’s 1 /3 rule. 


To obtain the composite Simpson’s 1/3 rule, the integration range {a, b ) is divided 
into n— 1 panels (n odd) of width h = ib — d)/[n — 1) each, as indicated in Fig. 6.5. 
Applying Eq. (a) to two adjacent panels, we have 


r x i+2 U 

/ fix) dx » [f[Xi) + 4/U- + i) + f(x i+2 )} - 

JXi 3 


Substituting Eq. (b) into 

fb 


r b n—2 r /.jc, +2 

/ f(x)dx = / f[x) dx = ^2 / fix)dx 

Ja Jx i f = i 3 Uxi 


yields 



I = Ifixi) + 4 fix 2 ) + 2 fix 3 ) + 4 fixfi H- 

• • • + 2/(X„_ 2 ) + 4/(X„_i) + fix n )] | 


(b) 


( 6 . 10 ) 


The composite Simpson’s 1/3 rule in Eq. (6.10) is perhaps the best-known method of 
numerical integration. Its reputation is somewhat undeserved, since the trapezoidal 
rule is more robust, and Romberg integration is more efficient. 

The error in the composite Simpson’s rule is 


E = 


ib-a)h 4 ,4, 

180 7 


( 6 . 11 ) 


from which we conclude that Eq. (6.10) is exact if fix) is a polynomial of degree three 
or less. 

Simpson’s 1 /3 rule requires the number of panels to be even. If this condition is 
not satisfied, we can integrate over the first (or last) three panels with Simpson’s 3/8 
rule: 

I = [fix i) + 3 fix 2 ) + 3 fix 3 ) + fixt)] ?> l l (6.12) 

O 


and use Simpson’s 1/3 rule for the remaining panels. The error in Eq. (6.12) is of the 
same order as in Eq. (6.10). 


EXAMPLE 6.1 

Derive Simpson’s 1 /3 rule from Newton-Cotes formulas. 
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Solution Referring to Fig. 6.4, we see that Simpson’s 1/3 rule uses three nodes located 
at X\ = a, x 2 = {a + b) /2 and x 3 = b. The spacing of the nodes is h = {h — a) /2. The 
cardinal functions of Lagrange’s three-point interpolation are (see Art. 3.2) 


fi(x) = 


(x- x 2 )(x- x 3 ) 
(Xi - X 2 )(Xi - x 3 ) 


= 


(x- x:)(x- x 3 ) 
(x 2 - x 1 )(x 2 - x 3 ) 


hW = 


{x - xi)(x - x 2 ) 


(x 3 - Xi)(X 3 - x 2 ) 

The integration of these functions is easier if we introduce the variable £ with origin 
at x 2 . Then the coordinates of the nodes are £ j = —h, £ 2 = 0, £ 3 = h and Eq. (6.2b) 
becomes A,- = li(x)dx = f h ) h £,-(£)t/£. Therefore, 


*-L*^*--br*-**-T 

i 

2 Wj_h 


A 3 = 


/. 


i- h (2/i) m 
Equation (6.2a) then yields 

3 


1 C h h 

= ^2 J f G 2 + = o 


i = J2 A ‘f = 


1=1 


/(a) + 4/ ( U+ 2 h ) + f{b) 


which is Simpson’s 1 /3 rule. 


EXAMPLE 6.2 

Evaluate the bounds on sin(x) dx with the composite trapezoidal rule using (1) 
eight panels and (2) sixteen panels. 

Solution of Part (1) With 8 panels there are 9 nodes spaced at h = n /8. The abscissas 
of the nodes are x,- = [i — l)jr/8, i = 1, 2,..., 9. From Eq. (6.5) we get 

• > . in 

sin 0 + 2 > sin-1- sin n 

ft 


/ = 


i=2 


— = 1.97423 
16 


The error is given by Eq. (6.6): 

[b— a)h 2 


E = — 


/"(?) = 


- (— sin £) = 2— sin £ 
768 5 


12 J 12 
where 0 < £ < jt . Since we do not know the value of £, we cannot evaluate E, but we 
can determine its bounds: 


■Emin = sin(0) = 0 
/bo 


jE’max =-sin — = 0.04037 

max 76g 2 
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Therefore, I + E n 


fo sin(x) dx < I + £ max , or 


/*- 


1.974 23 < / sin(x) dx < 2.014 60 


The exact integral is, of course, 1 = 2. 

Solution of Part (2) The new nodes created by the doubling of panels are located at 
midpoints of the old panels. Their abscissas are 

Xj = — + (/ - 1 )- = ( 2 /- 1 ) — , / = 1,2 . 8 

1 16 J 8 16 J 

Using the recursive trapezoidal rule in Eq. (6.9b), we get 

r 1.97423 it A . (2 j — l)jr , _ „ 

I = -1-) sin —-= 1.993 58 

2 16^ 


j =i 


16 


and the bounds on the error become (note that E is quartered when h is halved) 
E min = 0, E m ax = 0.04037/4 = 0.01009. Hence 

1.993 58 < f sin(x) dx < 2.003 67 
Jo 

EXAMPLE 6.3 

Estimate / Q 2 5 f(x) dx from the data 


X 

0 

0.5 

1.0 

1.5 

2.0 

2.5 

/M 

1.5000 

2.0000 

2.0000 

1.6364 

1.2500 

0.9565 


Solution We will use Simpson’s rules, since they are more accurate than the trape¬ 
zoidal rule. Because the number of panels is odd, we compute the integral over the 
first three panels by Simpson’s 3/8 rule, and use the 1/3 rule for the last two panels: 

I = [/(0) + 3/(0.5) + 3/(1.0) + /(1.5)] 

O 

+ [/(1.5)+4/(2.0) + /(2.5)]^ 5 
= 2.8381 + 1.2655 = 4.1036 


EXAMPLE 6.4 

Use the recursive trapezoidal rule to evaluate Jx cos x dx to six decimal places. 
How many function evaluations are required to achieve this result? 

Solution The program listed below utilizes the function trapezoid. Apart from the 
value of the integral, it displays the number of function evaluations used in the 
computation. 

















6.2 Newton-Cotes Formulas 

% Example 6.4 (Recursive trapezoidal rule) 
format long % Display extra precision 
I2h = 0; 
for k = 1:20 

Ih = trapezoid(@fex6_4,0,pi,I2h,k); 
if (k > 1 & abs(Ih - I2h) < 1.0e-6) 
Integral = Ih 

No_of_func_evaluations = 2"(k-l) + 1 
return 

end 

I2h = Ih; 

end 

error('Too many iterations’) 

The M-file containing the function to be integrated is 

function y = fex6_4(x) 

% Function used in Example 6.4 
y = sqrt(x)*cos(x); 

Here is the output: 

» Integral = 

-0.89483166485329 
No_of_func_evaluations = 


32769 


Rounding to six decimal places, we have -Jx cos xdx= —0.894 832 
The number of function evaluations is unusually large in this problem. The slow 
convergence is the result of the derivatives of /( x) being singular at x = 0. Conse¬ 
quently, the error does not behave as shown in Eq. (6.7): E = c\h 2 + c 2 h' -\ -, but is 

unpredictable. Difficulties of this nature can often be remedied by a change in vari¬ 
able. In this case, we introduce t = -Jx, so that dt= dx/(2-/x) = dx/{2t), oxdx = 2tdt. 
Thus 



y/x cos xdx = 



Evaluation of the integral on the right-hand side would require 4097 function 
evaluations. 
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6.3 Romberg Integration 


Romberg integration combines the composite trapezoidal rule with Richardson ex¬ 
trapolation (see Art. 5.3). Let us first introduce the notation 


Rn = I t 


where, as before, /, represents the approximate value of f(x)dx computed by the 
recursive trapezoidal rule using 2 i_1 panels. Recall that the error in this approximation 
is E = Cih 2 + c 2 h A -\ -, where 


is the width of a panel. 

Romberg integration starts with the computation of *11 = I\ (one panel) and 
* 2,1 = h (two panels) from the trapezoidal rule. The leading error term C\h 2 is then 
eliminated by Richardson extrapolation. Using p = 2 (the exponent in the error term) 
in Eq. (5.9) and denoting the result by * 2 , 2 , we obtain 


R2,2 = 


2 2 R 2 ,i - *i,i 
2 2 - 1 


4 „ 1 

— R 2 1 — — R\ 1 
3 3 


(a) 


It is convenient to store the results in an array of the form 


fli.r 

*2,1 *2,2 

The next step is to calculate * 3i 1 = / 3 (four panels) and repeat Richardson extra¬ 
polation with R 2 ,i and R 2 , 1 . storing the result as /f 3 2 : 

*3,2 = 2*3,1 — (b) 

The elements of array R calculated so far are 

"*1,1 

*2,1 *2,2 
*3,1 *3,2 


Both elements of the second column have an error of the form c 2 h A , which can also 
be eliminated with Richardson extrapolation. Using p= 4 in Eq. (5.9), we get 


*3,3 


2 4 *3,2 — *2,2 
2 4 - 1 


^*3,2-^*2,2 


(c) 


This result has an error of 0[h 6 ). The array has now expanded to 


*1,1 

*2,1 *2,2 

*3,1 *3,2 *3,3 
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After another round of calculations we get 

~Ri,i 

^ 2,1 R - 2,2 

^ 3,1 R3.2 R3,3 

_f?4 1 f?42 f?4,3 ^4,4_ 

where the error in R iA is 0{h & ). Note that the most accurate estimate of the integral 
is always the last diagonal term of the array. This process is continued until the differ¬ 
ence between two successive diagonal terms becomes sufficiently small. The general 
extrapolation formula used in this scheme is 


Rtj = 


4 J — Ri-} i- 


i-ij-i 


4 l~ l - 1 


i > 1, 7 = 2,3,...,i 


A pictorial representation of Eq. (6.13a) is 

\ 


R, 


i- 1 , 7-1 


\ 


R,j-i -*■ P —■*■ Rtj 


where the multipliers a and fi depend on j in the following manner: 


j 

2 

3 

4 

5 

6 

a 

- 1/3 

-1/15 

-1/63 

-1/255 

-1/1023 

p 

4/3 

16/15 

64/63 

256/255 

1024/1023 


(6.13a) 


(6.13b) 


(6.13c) 


The triangular array is convenient for hand computations, but computer imple¬ 
mentation of the Romberg algorithm can be carried out within a one-dimensional 
array r. After the first extrapolation—see Eq. (a )—Ry i is never used again, so that it 
can be replaced with R 2A . As a result, we have the array 


T\ = R2,2 
r 2 = R2 ,1 

In the second extrapolation round, defined by Eqs. (b) and (c), R 3 2 overwrites R 2A , 
and i? 3 3 replaces i? 2 , 2 . so that the array now contains 


A = f?3,3 

r 2 = R 32 

_ r 3 = R3 ,i _ 

and so on. In this manner, ri always contains the best current result. The extrapolation 
formula for the /ah round is 

_ 4 k -ir J+1 - rj 
J 4 k ~i - 1 


j = k— l, k— 2,..., 1 


(6.14) 
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■ romberg 

The algorithm for Romberg integration is implemented in the function romberg. It 
returns the value of the integral and the required number of function evaluations. 
Richardson’s extrapolation is performed by the subfunction richardson. 


function [I,numEval] = romberg(func,a,b,tol,kMax) 
% Romberg integration. 

% USAGE: [I,numEval] = romberg(func,a,b,tol,kMax) 
% INPUT: 


% func 
% a, b 
% tol 
% kMax 
% 

% OUTPUT: 
% I 

% numEval 


= handle of function being integrated. 

= limits of integration. 

= error tolerance (default is 1.0e-8). 

= limit on the number of panel doublings 
(default is 20). 

= value of the integral. 

= number of function evaluations. 


if nargin < 5; kMax = 20; end 

if nargin < 4; tol = 1.0e-8; end 

r = zeros(kMax); 

r(l) = trapezoid(func,a,b,0,1); 

rOld = r(l); 

for k = 2:kMax 

r(k) = trapezoid(func,a,b,r(k-1),k); 
r = richardson(r,k); 
if abs(r(l) - rOld) < tol 

numEval = 2-(k-1) + 1; I = r(l); 
return 

end 

rOld = r(l); 

end 

error(’Failed to converge’) 


function r = richardson(r,k) 

% Richardson’s extrapolation in Eq. (6.14). 
for j = k-1:-1:1 

c = 4~(k-j); r(j) = (c*r(j+l) - r(j))/(c-l); 


end 
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EXAMPLE 6.5 

Show that 2 in Romberg integration is identical to the composite Simpson’s 1/3 rule 

in Eq. (6.10) with 2 k ~ 1 panels. 

Solution Recall that in Romberg integration R^j = h denoted the approximate in¬ 
tegral obtained by the composite trapezoidal rule with 2 k ~ l panels. Denoting the 
abscissas of the nodes by Xi, x 2 ,..., x n , we have from the composite trapezoidal rule 
in Eq. (6.5) 


*t i 


h = 


n— 1 


fix i) + 2 J2 /(x,) + - /(x„) 


i=2 


h 

2 


When we halve the number of panels (panel width 2 h), only the odd-numbered ab¬ 
scissas enter the composite trapezoidal rule, yielding 


Rk- i.i = h- i = 


fix i) + 2 Y] fi x f> + fi x f 


1=3,5,. 


Applying Richardson extrapolation yields 


Rk, 2 


4 1 

-R U - -Rfc-u 



fiXi) + 


| E fiXi) + \fiXn) 
3 1=3,5,... 13 


h 


which agrees with Simpson’s rule in Eq. (6.10). 


EXAMPLE 6.6 

Use Romberg integration to evaluate f* /(x) dx, where /(x) = sin x. Work with four 
decimal places. 


Solution From the recursive trapezoidal rule in Eq. (6.9b) we get 

*i.i = IM = | [/(0) + fin)) = 0 

* 2 ,i = Uni 2) = X -Iin) + | fin/2) = 1.5708 

* 3,1 = Hn/4) = Uin/2) + J [fin/4) + f[3n/4)] = 1.8961 

* 4 ,i = Iin/ 8) = E(tt/4) + J [/(jt/8) + /(3tt/8) + /(5jt/8) + /(7tt/8)] 
Z o 

= 1.9742 
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Using the extrapolation formulas in Eqs. (6.13), we can now construct the following 
table: 


~Ri, i 


0 

f?2,l Rl,2 


1.5708 2.0944 

f?3,l f?3.2 f?3,3 


1.8961 2.0046 1.9986 

_f? 4 ,l R 42 f?4,3 Ra,A_ 


1.9742 2.0003 2.0000 2.0000_ 


It appears that the procedure has converged. Therefore, sin xdx = R 4A = 2.0000, 
which is, of course, the correct result. 

EXAMPLE 6.7 

Use Romberg integration to evaluate 2x 1 2 cos x 2 dx and compare the results with 
Example 6.4. 

Solution 

» format long 

» [Integral,numEval] = romberg(@fex6_7,0,sqrt(pi)) 

Integral = 

-0.89483146948416 
numEval = 

257 

» 


Here the M-file defining the function to be integrated is 

function y = fex6_7(x) 

% Function used in Example 6.7 
y = 2*(x~2)*cos(x~2); 


It is clear that Romberg integration is considerably more efficient than the trape¬ 
zoidal rule. It required 257 function evaluations as compared to 4097 evaluations with 
the composite trapezoidal rule in Example 6.4. 

PROBLEM SET 6.1 

1. Use the recursive trapezoidal rule to evaluate / 0 T/4 ln(l + Lan x)dx. Explain the 
results. 

2. The table shows the power P supplied to the driving wheels of a car as a function 
of the speed v. If the mass of the car is m= 2000 kg, determine the time At it 
takes for the car to accelerate from 1 m/s to 6 m/s. Use the trapezoidal rule for 
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integration. Hint'. 

s>6s 

At=m [v/P) dv 

J Is 

which can be derived from Newton’s law P = m(dv/dt) and the definition of power 
P = Fv. 


v (m/s) 

0 

1.0 

1.8 

2.4 

3.5 

4.4 

5.1 

6.0 

P (kW) 

0 

4.7 

12.2 

19.0 

31.8 

40.1 

43.8 

43.2 


3. Evaluate f]_ { cos(2 cos 1 x)dx with Simpson’s 1/3 rule using 2, 4 and 6 panels. 
Explain the results. 

4. Determine /“(1 + x 4 ) _1 dx with the trapezoidal rule using five panels and com¬ 
pare the result with the “exact” integral 0.243 75. Hint : use the transformation 
x 3 = 1/t. 

5. 



The table below gives the pull F of the bow as a function of the draw x. If the bow 
is drawn 0.5 m, determine the speed of the 0.075-kg arrow when it leaves the bow. 
Hint: the kinetic energy of arrow equals the work done in drawing the bow; that 
is, mv 2 /2 = / 0 °' 5m E dx. 


x (m) 

0.00 

0.05 

0.10 

0.15 

0.20 

0.25 

F (N) 

0 

37 

71 

104 

134 

161 

x (m) 

0.30 

0.35 

0.40 

0.45 

0.50 


F (N) 

185 

207 

225 

239 

250 



6 . Evaluate / Q 2 (x 5 + 3x 3 — 2) dx by Romberg integration. 

7. Estimate /(x) dx as accurately as possible, where /(x) is defined by the data 


X 

0 

7T/4 

7r/2 

3tt/4 

7r 

/M 

1.0000 

0.3431 

0.2500 

0.3431 

1.0000 
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8 . Evaluate 



smx , 
—— dx 
Jx 


with Romberg integration. Hint: use transformation of variable to eliminate the 
indeterminacy at x = 0. 

9. Showthatify = f(x) is approximated by a natural cubic spline with evenly spaced 
knots atxi, x 2 ,..., x n , the quadrature formula becomes 


I = 


h 

2 


(yi + 2y 2 + 2y3 + • • • + 2y„_i + y n ) 


h 3 

24 


(fci + 2k 2 + k 2 + • • • + 2k n -\ + k n ) 


where h is the spacing of the knots and k = y". Note that the first part is the 
composite trapezoidal rule; the second part may be viewed as a “correction” for 
curvature. 


10. ■ Use a computer program to evaluate 



dx 

\/sinx 


with Romberg integration. Hint: use the transformation sin x = I 2 . 

11. ■ The period of a simple pendulum of length L is r = 4 -JL/g h(8 0 ), where g is 
the gravitational acceleration, 8 0 represents the angular amplitude and 


He 0 ) 



de 




sin 2 (0 o /2) sin 2 6 


Compute h(15°), h(30°) and /i(45'), and compare these values with h(0) = jr/2 (the 
approximation used for small amplitudes). 



The figure shows an elastic half-space that carries uniform loading of intensity q 
over a circular area of radius a. The vertical displacement of the surface at point 
P can be shown to be 


w{r) = wo 


rit/2 

Jo 


cos 2 e 


{r/a) 2 — sin 2 6 


-.de 


r > a 
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where w 0 is the displacement at r = a. Use numerical integration to determine 
w/wq at r = 2a. 


13. ■ 



The mass m is attached to a spring of free length b and stiffness k. The coefficient 
of friction between the mass and the horizontal rod is jx. The acceleration of the 
mass can be shown to be (you may wish to prove this) x = — f{x), where 



If the mass is released from rest at x = b, its speed at x = 0 is given by 



Compute Vq by numerical integration using the data m = 0.8 kg, b= 0.4 m, 
/x = 0.3, k= 80N/mandg = 9.81 m/s 2 . 

14. ■ Debye’s formula for the heat capacity Cy of a solid is CV = 9 Nkg{u ), where 



The terms in this equation are 


N = number of particles in the solid 
k = Boltzmann constant 
u = T/ 0 £> 

T = absolute temperature 


0 D = Debye temperature 

Compute g(u) from u = 0 to 1.0 in intervals of 0.05 and plot the results. 

15. BA power spike in an electric circuit results in the current 


i(t ) = ioe t/to sin( 2 r/f 0 ) 
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across a resistor. The energy E dissipated by the resistor is 

pOO 

E= R [ i[t)] 2 dt 
Jo 

Find E using the data i 0 = 100 A, R = 0.5 and t 0 = 0.01 s. 


6.4 Gaussian Integration 

Gaussian Integration Formulas 

We found that Newton-Cotes formulas for approximating f{x)dx work best if /(x) 
is a smooth function, such as a polynomial. This is also true for Gaussian quadrature. 
However, Gaussian formulas are also good at estimating integrals of the form 


f 


w{x) f[x) dx 


(6.15) 


where w{x), called the weighting function, can contain singularities, as long as they 
are integrable. An example of such an integral is C 1 (1 + x 2 ) In x dx. Sometimes infinite 
limits, as in / 0 °° e~ x sin x dx, can also be accommodated. 

Gaussian integration formulas have the same form as Newton-Cotes rules: 


n 

I = J2 

i=l 


(6.16) 


where, as before, I represents the approximation to the integral in Eq. (6.15). The 
difference lies in the way that the weights A, and nodal abscissas x, are determined. In 
Newton-Cotes integration the nodes were evenly spaced in ( a , h), i.e., their locations 
were predetermined. In Gaussian quadrature the nodes and weights are chosen so 
that Eq. (6.16) yields the exact integral if /(x) is a polynomial of degree 2 n— 1 or less; 
that is, 



n 

w{x)P m (x)dx = E A/ P m {Xi ), 
1=1 


m<2n— 1 


(6.17) 


One way of determining the weights and abscissas is to substitute Pi (x) = 1, P 2 (x) = 
x,, P 2n _ 1 (x) = x 2,!_1 in Eq. (6.17) and solve the resulting 2 n equations 

*b n 

/ w{x)x^dx = ^ Aixj, j = 0,1 ,..., 2n— 1 
Ja i—i 


for the unknowns A ,• and x, ,; = 1,2. n. 
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As an illustration, let w(x) = e x , a = 0, b = oo and n = 2. The four equations 
determining X\,x 2 , Ai and A 2 are 


/" 


e X dx = A\ + A 2 


f 


e X xdx = A 1 X 1 + A 2 X 2 

/»oo 

/ e~ x x 2 dx = Aixf + 

Jo 

/* 00 

/ e~ x x 3 dx = A\xf + A 2 xf 
Jo 

After evaluating the integrals, we get 


Ai + A 2 — 1 
A1JC1 + A 2 jc 2 = 1 
Aijcf + A 2 ;tf = 2 
A\x\ + A 2 x 2 = 6 

The solution is 

r~ V2+1 

X\ = 2 — v2 Ai = -—— 

2V2 

r- \[2 — 1 

-*2=2 + v2 A 2 = - -=- 

2V2 

so that the quadrature formula becomes 

^ e~ x f[x)dx « [(v/2 + 1) /(2 - V 2 ) + (V2 - 1) /(2 + V 2 )] 

Due to the nonlinearity of the equations, this approach will not work well for 
large n. Practical methods of finding jq and A, require some knowledge of orthogo¬ 
nal polynomials and their relationship to Gaussian quadrature. There are, however, 
several “classical” Gaussian integration formulas for which the abscissas and weights 
have been computed with great precision and tabulated. These formulas can used 
without knowing the theory behind them, since all one needs for Gaussian integra¬ 
tion are the values of jq and A, . If you do not intend to venture outside the classical 
formulas, you can skip the next two topics. 
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■"Orthogonal Polynomials 


Orthogonal polynomials are employed in many areas of mathematics and numerical 
analysis. They have been studied thoroughly and many of their properties are known. 
What follows is a very small compendium of a large topic. 

The polynomials <p n ( x), n = 0,1,2,... (nis the degree of the polynomial) are said 
to form an orthogonal set in the interval {a, h) with respect to the weighting function 
w{x ) if 



w{x)(p m {x)(p n {x)dx = 0, 


m=/= n 


(6.18) 


The set is determined, except for a constant factor, by the choice of the weighting func¬ 
tion and the limits of integration. That is, each set of orthogonal polynomials is asso¬ 
ciated with certain w(x), a and b. The constant factor is specified by standardization. 
Some of the classical orthogonal polynomials, named after well-known mathemati¬ 
cians, are listed in Table 6.1. The last column in the table shows the standardization 
used. 


Name 

Symbol 

a 

b 

w[x) 

/f W(x) \<p n {x)} 2 dx 

Legendre 

Chebyshev 

Laguerre 

Hermite 

Pn(x) 
T n (X ) 

f -n (.r) 
H n (x) 

-i 

-i 

0 

—oo 

1 

1 

oo 

oo 

1 

(1 - JC 2 )- 1 / 2 
e~ x 

-X 2 

e 

2/{2n+ 1) 

7r/2 [n > 0) 

1 

yfn2 n n\ 


Table 6.1 


Orthogonal polynomials obey recurrence relations of the form 

a n (Pn+ 1 M = (b„ + c n x)(p n (x ) - d n (p n -i(x) (6.19) 

If the first two polynomials of the set are known, the other members of the set can be 
computed from Eq. (6.19). The coefficients in the recurrence formula, together with 
ip 0 (jc) and (p | (x) , are given in Table 6.2. 


Name 

<Po(x) 

f>l(x) 


K 

Cn 

dn 

Legendre 

1 

X 

n+ 1 

0 

2 n+ 1 

n 

Chebyshev 

1 

X 

1 

0 

2 

1 

Laguerre 

1 

1 — X 

n+ 1 

2n+ 1 

-1 

n 

Hermite 

1 

2x 

1 

0 

2 

2 


Table 6.2 
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The classical orthogonal polynomials are also obtainable from the formulas 



T n {x) = cos(«cos 1 x), n > 0 



( 6 . 20 ) 


and their derivatives can be calculated from 


(1 - x 2 )p' n {x) = n[-xp n {x) + Pn -1 W] 
(1 - x 2 )T^{x] = n\-xT„{x) + nr„_i(x)] 


xL' n {x) = n[L n {x ) - L n _]{x)\ 
Kix) = 2nH n -i (x) 


( 6 . 21 ) 


Other properties of orthogonal polynomials that have relevance to Gaussian in¬ 
tegration are: 

• (p n {x) has n real, distinct zeroes in the interval (a, b). 

• The zeroes of <p n {x) lie between the zeroes of cp n+1 (x). 

• Any polynomial P„(x) of degree n can be expressed in the form 


n 


Pn M = 


( 6 . 22 ) 


• It follows from Eq. (6.22) and the orthogonality property in Eq. (6.18) that 





(6.23) 


^Determination of Nodal Abscissas and Weights 

Theorem The nodal abscissas Xi, Xz ,..., x„ are the zeros of the polynomial <p n (x) that 
belongs to the orthogonal set defined in Eq. (6.18). 

Proof We start the proof by letting /(x) = P 2n -\ M be a polynomial of degree 2 n — 1. 
Since the Gaussian integration with n nodes is exact for this polynomial, we have 



(a) 


A polynomial of degree 2n — 1 can always written in the form 


Pin- lW = Qn-\U) + Rj-iW(D„(x) 


(b) 
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where Q„_i (x), R n -1 (x) and <p n {x) are polynomials of the degree indicated by the 
subscripts. 11 Therefore, 

pb pb pb 

/ w{x)P 2n -i [x)dx = / w(x)Q„_i(x)dx+ / w[x)R n -iix)(p n {x)dx 
J d J d J d 

But according to Eq. (6.23) the second integral on the right hand-side vanishes, 

so that 


pb pb 

/ w{x)P 2n -\{x)dx = / W{x)Q n -\[x)dx 
J a J a 


(C) 


Because a polynomial of degree n — 1 is uniquely defined by n points, it is always 
possible to find A; such that 


r-b n 

/ iu(x)Q„_i(x)dx = E Ai Qn— 1 (x"/, 

Ja i =1 


(d) 


In order to arrive at Eq. (a), we must choose for the nodal abscissas x,- the roots of 
<p n (x) = 0. According to Eq. (b) we then have 


Pzn-AXi) = Q n -i{Xi), i = 1, 2,..., n 
which together with Eqs. (c) and (d) leads to 

pb pb n 

/ w[x)P 2n -i{x)dx= / w[x)Q n - 1 (x)dx='^AiP 2 n-i(Xi] 

v d J d j_l 

This completes the proof. 

Theorem 


(e) 


*-/ 


At = / w{x)li(x)dx, i=l,2,...,n 


(6.24) 


where f,(x) are the Lagrange’s cardinal functions spanning the nodes at 
Xi, x 2 ,... x„. These functions were defined in Eq. (3.2). 

Proof Applying Lagrange’s formula, Eq. (3.1a), to Q„_i(x) yields 

n 

Qn—1 (x) = Qn-AXi)li{x) 

i =1 

which upon substitution in Eq. (d) gives us 

n r pb -I n 

E Qn-lOd )/ W{x)ti{x)dx = Y AiQn-ltXi) 

i=l L Jd J i=1 


or 


ri n 

E Qn-1 (Xi) 4 - / 

!=1 L 2a 


A; - / w{x)ti{x)dx 


= 0 


11 It can be shown that Q n -i M and (x) are unique for given P 2 n-i W and i^(x). 
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This equation can be satisfied for arbitrary Q„_i only if 

f b 

Ai — / w(x)li(x)dx = 0, i=l,2,...,n 

J a 

which is equivalent to Eq. (6.24). 

It is not difficult to compute the zeros x,-, i = 1,2,..., n of a polynomial <p n (x) 
belonging to an orthogonal set by one of the methods discussed in Chapter 4. Once 
the zeros are known, the weights A,-, i = 1,2could be found from Eq. (6.24). 
However the following formulas (given without proof) are easier to compute 


Gauss-Legendre 

Ai = 





(1 - X 2 ) [Pn(Xi)\ 


Gauss-Laguerre 

Ai = 

1 

(6.25) 



Xi [L'(Xi)] 


Gauss-Hermite 

At = 

2 n+1 n\^7t 



[W'te)f 


Abscissas and Weights for Gaussian Quadratures 

We list here some classical Gaussian integration formulas. The tables of nodal abscis¬ 
sas and weights, covering n= 2 to 6, have been rounded off to six decimal places. 
These tables should be adequate for hand computation, but in programming you 
may need more precision ora larger number of nodes. In that case you should consult 
other references, 12 or use a subroutine to compute the abscissas and weights within 
the integration program. 13 

The truncation error in Gaussian quadrature 

„b n 

E= w{x)f{x)dx - Y A f{Xi) 

Ja i=l 

has the form E = K[n)f (2ri> {c), where a < c < b (the value of c is unknown; only its 
bounds are given). The expression for K{n) depends on the particular quadrature 
being used. If the derivatives of /(x) can be evaluated, the error formulas are useful 
is estimating the error bounds. 


12 Handbook of Mathematical Functions, M. Abramowitz and I.A. Stegun, Dover Publications (1965); 
A.H. Stroud and D. Secrest, Gaussian Quadrature Formulas, Prentice-Hall (1966). 

13 Several such subroutines are listed in Numerical Recipes in Fortran 90, W.H. Press et al., Cambridge 
University Press (1996). 
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Gauss-Legendre quadrature 


f 1 ± 
*'- 1 1=1 


m 


(6.26) 



A 

n=2 

0.577 350 1.000 000 

n= 3 

0.000 000 0.888 889 

0.774 597 0.555 556 

n= 4 

0.339 981 0.652145 

0.861 136 0.347 855 

n=5 

0.000 000 0.568 889 

0.538469 0.478 629 

0.906180 0.236927 

n= 6 

0.238 619 0.467 914 

0.661209 0.360 762 

0.932470 0.171324 


Table 6.3 


This is the most often used Gaussian integration formula. The nodes are arranged 
symmetrically about £ = 0, and the weights associated with a symmetric pair of nodes 
are equal. For example, for n = 2 we have f, = — £ 2 and Ai = A 2 . The truncation error 
in Eq. (6.26) is 


E = 


2 2 ” +1 ( n !) 4 „ )(c)> 

( 2n + 1 ) [( 2rc)!] 3 


- 1 < c < 1 


(6.27) 


To apply Gauss-Legendre quadrature to the integral f{x)dx, we must first map 
the integration range ( a , /;) into the “standard” range (—1,1). We can accomplish this 
by the transformation 


x = 


b+ a 


+ 


b - a 
2 


£ 


Now dx = d^[b- a) jl, and the quadrature becomes 



b— a 
2 


n 

J2 A if( X ‘) 

i= 1 


(6.28) 


(6.29) 


where the abscissas X; must be computed from Eq. (6.28). The truncation error 
here is 


(b-d ) 2n +1 (n!) 4 m 
{2n+ 1) [(2n)!] 3 ^ 


a < c < b 


(6.30) 
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Gauss-Chebyshev quadrature 


f (1 - x 2 ) 1/2 fix)dx f{Xi) 

"'- 1 U (=1 


(6.31) 


Note that all the weights are equal: Ai = n / n. The abscissas of the nodes, which are 
symmetric about x = 0, are given by 


Xi 


= cos 


(2 i - l)jr 
2 n 


(6.32) 


The truncation error is 


E = 


2jt 

2 2n (2ri)\ 


r n \ C ), 


- 1 < c < 1 


(6.33) 


Gauss-Laguerre quadrature 


r°° n 

/ e~ x f[x) dx ss V Ai fix, 

i =1 


(6.34) 


Xi 



At 

Xi 


Ai 


n = 

2 



n— 5 


0.585 786 



0.853 554 

0.263 560 


0.521756 

3.414214 



0.146447 

1.413 403 


0.398 667 


n = 

3 


3.596426 


(-1)0.759 424 

0.415 775 



0.711093 

7.085810 


(-2)0.361175 

2.294 280 



0.278 517 

12.640 801 


(-4)0.233 670 

6.289 945 

n = 

4 

(-1)0.103 892 

0.222 847 

n = 6 

0.458 964 

0.322 548 



0.603 154 

1.188 932 


0.417 000 

1.745 761 



0.357418 

2.992 736 


0.113 373 

4.536620 



(-1)0.388 791 

5.775 144 


(-1)0.103 992 

9.395 071 



(-3)0.539 295 

9.837 467 


(-3)0.261017 





15.982 874 


(-6)0.898 548 


Table 6.4. Multiply numbers by 10*, where k is given in parentheses 


E = 


in'.) 2 

(2/i)! 


f {2n) ic), 


0 < c < oo 


(6.35) 


G auss-Hermite quadrature: 



n 

e~* 2 fix)dx « A /U) 
1=1 


(6.36) 
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The nodes are placed symmetrically about x = 0, each symmetric pair having the 
same weight. 


!? 

-H 


n = 2 

0.707107 0.886227 

n = 3 

0.000 000 1.181636 

1.224745 0.295 409 

n = 4 

0.524 648 0.804 914 

1.650680 (-1)0.813128 

n= 5 

0.000 000 0.945308 

0.958 572 0.393 619 

2.020183 (-1)0.199 532 

n = 6 

0.436 077 0.724 629 

1.335 849 0.157 067 

2.350 605 (-2)0.453 001 


Table 6.5. Multiply numbers by 10 fc , where k is given in parentheses 


£= ^fS /t2n,(C) ’ 0<C< °° (6 ‘ 37) 


Gauss quadrature with logarithmic singularity 



\n(x)clx 


n 

Y AfUi) 

i =1 


(6.38) 


Xi 


At 

Xi 


A 


n = 2 



n= 5 


0.112 009 


0.718 539 

(-1)0.291345 


0.297 893 

0.602277 


0.281461 

0.173 977 


0.349 776 


n = 3 


0.411703 


0.234 488 

(-1)0.638 907 


0.513405 

0.677314 


(-1)0.989 305 

0.368 997 


0.391980 

0.894 771 


(-1)0.189116 

0.766 880 

n = 4 

(-1)0.946154 

(-1)0.216344 

n= 6 

0.238 764 

(-1)0.414485 


0.383 464 

0.129 583 


0.308 287 

0.245 275 


0.386875 

0.314 020 


0.245 317 

0.556 165 


0.190435 

0.538 657 


0.142 009 

0.848 982 


(-1)0.392 255 

0.756916 


(-1)0.554 546 




0.922 669 


(-1)0.101690 


Table 6.6. Multiply numbers by 10 fc , where k is given in parentheses 
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(6.39) 


where k{2) = 0.00285, fc(3) = 0.000 17, fc(4) = 0.000 01. 

■ gaussNodes 

The function gaussNodes computes the nodal abscissas x, and the corresponding 
weights Aj used in Gauss-Legendre quadrature. 14 It can be shown that the approxi¬ 
mate values of the abscissas are 


7i {i — 0.25) 
xi = cos--- 


Using these approximations as the starting values, we compute the nodal ab¬ 
scissas by finding the nonnegative zeros of the Legendre polynomial p n {x) with 
the Newton-Raphson method (the negative zeros are obtained from symmetry). 
Note that gaussNodes calls the subfunction legendre, which returns p n {t ) and its 
derivative. 

function [x,A] = gaussNodesfn,tol) 

% Computes nodal abscissas x and weights A of 
% Gauss-Legendre n-point quadrature. 

% USAGE: [x,A] = gaussNodes(n,epsilon,maxlter) 

% tol = error tolerance (default is 1.0e4*eps). 

if nargin < 2; tol = 1.0e4*eps; end 
A = zeros(n,l); x = zeros(n,l); 

nRoots = fix(n + l)/2; % Number of non-neg. roots 

for i = 1:nRoots 

t = cos(pi*(i - 0.25)/(n + 0.5)); % Approx, roots 
for j = i:30 


[p,dp] = legendre(t,n); 
dt = -p/dp; t = t + dt; 
if abs(dt) < tol 


% Newton’s 


% root finding 
% method 


x(i) = t; x(n-i+l) = -t; 

A(i) = 2/(1-t"2)/dp'2; % Eq. (6.25) 

A(n-i+l) = A(i); 

break 


14 This function is an adaptation of a routine in Numerical Recipes in Fortran 90, W.H. Press et al., 
Cambridge University Press (1996). 
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end 

end 

end 

function [p,dp] = legendre(t,n) 

% Evaluates Legendre polynomial p of degree n 
% and its derivative dp at x = t. 
pO = 1.0; pi = t; 
for k = l:n-l 

p = ((2*k + l)*t*pl - k*p0)/(k + 1); % Eq. (6.19) 
pO = pi;pi = p; 

end 

dp = n *(p0 - t*pl)/(l - t' 2 ) ; % Eq. (6.21) 

■ gaussQuad 

The function gaussQuad evaluates / fl 6 fix) dx with Gauss-Legendre quadrature us¬ 
ing n nodes. The function defining f[x) must be supplied by the user. The nodal 
abscissas and the weights are obtained by calling gaussNodes. 

function I = gaussQuad(func,a,b,n) 

% Gauss-Legendre quadrature. 

% USAGE: I = gaussQuad(func,a,b,n) 

% INPUT: 

% func = handle of function to be integrated. 

% a,b = integration limits. 

% n = order of integration. 

% OUTPUT: 

% I = integral 

cl = (b + a)/2; c2 = (b - a)/2; % Mapping constants 

[x,A] = gaussNodes(n); % Nodal abscissas & weights 

sum = 0; 

for i = l:length(x) 

y = feval(func,cl + c2*x(i)); % Function at node i 
sum = sum + A(i)*y; 

end 

I = c2*sum; 

EXAMPLE 6.8 

Evaluate (1 — j <c 2 ) 3/2 dx as accurately as possible with Gaussian integration. 





6.4 Gaussian Integration 


Solution As the integrand is smooth and free of singularities, we could use Gauss- 
Legendre quadrature. However, the exact integral can obtained with the Gauss- 
Chebyshev formula. We write 

f (l-x 2 ) 3 , 2 dx= f ^-=LL dx 

J- 1 v ’ J -1 

The numerator /(x) = (1 - x 2 ) 2 is a polynomial of degree four, so that Gauss- 
Chebyshev quadrature is exact with three nodes. 

The abscissas of the nodes are obtained from Eq. (6.32). Substituting n= 3, we 
get 

(2 i - l)ir . 

* = COS _ 2 ( 3 > , = 1 ' 2 ' 3 

Therefore, 


7T V3 
Xi = cos — = — 
6 2 

7T 

x 2 = cos — = 0 


5jt V3 


x 2 = cos-= 

6 2 


and Eq. (6.31) yields 


f\l-x 2 f 2 dx=^±(l-x?) 2 

° i= 1 


1 - - ) + (1 - 0) z + 



3jt 

~8~ 


EXAMPLE 6.9 

Use Gaussian integration to evaluate / 0 ° 5 cos 7rx In x dx. 

Solution We split the integral into two parts: 

/*0.5 pi pi 

/ cosjrxlnxrfx= / cos7rxlnxdx— / cos7rxlnxdx 
Jo Jo J 0.5 

The first integral on the right-hand side, which contains a logarithmic singularity at 
x = 0, can be computed with the special Gaussian quadrature in Eq. (6.38). Choosing 
n = 4, we have 
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where x ,• and A,- are given in Table 6.6. The sum is evaluated in the following table: 


Xi 

COS ITXf 

A 

Ai cos nXi 

0.041448 

0.991 534 

0.383 464 

0.380 218 

0.245 275 

0.717 525 

0.386875 

0.277 592 

0.556165 

-0.175 533 

0.190 435 

-0.033 428 

0.848 982 

-0.889 550 

0.039 225 

-0.034 892 

E = 0.589 490 


Thus 


f 


cosnxlnxdx ss —0.589 490 


The second integral is free of singularities, so that it can be evaluated with Gauss- 
Legendre quadrature. Choosing again n = 4, we have 


r 1 4 

/ cosnxlnxdx « 0.25 V' A,- cosTrjqln; 

J0.5 i=1 


where the nodal abscissas are (see Eq. (6.28)) 

1 + 0.5 1 - 0.5 


Xi = 


+ 


= 0.75 +0.25? ; 


2 2 

Looking up ?,• and A,- in Table 6.3 leads to the following computations: 


Hi 

Xi 

cos nXj In jq 

Ai 

Ai cos nxt In X; 

-0.861136 

0.534 716 

0.068 141 

0.347 855 

0.023 703 

-0.339 981 

0.665 005 

0.202 133 

0.652 145 

0.131820 

0.339 981 

0.834 995 

0.156638 

0.652 145 

0.102151 

0.861136 

0.965 284 

0.035 123 

0.347 855 

0.012218 

£ = 0.269 892 


from which 


Therefore, 


/'■ 


f 


cosnxlnxdx ^ 0.25(0.269 892) = 0.067 473 


cosnxlnxdx ss —0. 589 490 — 0.067473 = —0.656963 


which is correct to six decimal places. 
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EXAMPLE 6.10 


Evaluate as accurately as possible 



Solution In its present form, the integral is not suited to any of the Gaussian quadra¬ 
tures listed in this article. But using the transformation 


x = I 2 dx= 2 tdt 


we have 


F = 2 (t 2 + 3)e r 'dt = (t 2 + 3)e ,l dt 


0 J — oo 


which can be evaluated exactly with Gauss-Hermite formula using only two nodes 
{n = 2). Thus 

F = A\ (tf + 3) + A 2 {t\ + 3) 

= 0.886 227 [(0.707 107) 2 + 3] + 0.886 227 [(-0.707 107) 2 + 3] 

= 6.203 59 

EXAMPLE 6.11 

Determine how many nodes are required to evaluate 



with Gauss-Legendre quadrature to six decimal places. The exact integral, rounded 
to six places, is 1.41815. 

Solution The integrand is a smooth function; hence it is suited for Gauss-Legendre 
integration. There is an indeterminacy at x = 0, but this does notbother the quadrature 
since the integrand is never evaluated at that point. We used the following program 
that computes the quadrature with 2, 3,.. .nodes until the desired accuracy is reached: 

% Example 6.11 (Gauss-Legendre quadrature) 
a = 0; b = pi; Iexact = 1.41815; 
for n = 2:12 

I = gaussQuad(@fex6_11,a,b,n); 
if abs(I - Iexact) < 0.00001 
I 
n 

break 


end 


end 
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The M-file of the function integrated is 

function y = fex6_ll(x) 

% Function used in Example 6.11 
y = (sin(x)/x)"2; 

The program produced the following output: 


I = 

1.41815026780139 

n = 

5 


EXAMPLE 6.12 

Evaluate numerically f^ 5 fix) dx, where fix) is represented by the unevenly spaced 
data 


X 

1.2 

1.7 

2.0 

2.4 

2.9 

3.3 

/M 

-0.362 36 

0.128 84 

0.41615 

0.737 39 

0.970 96 

0.987 48 


Knowing that the data points lie on the curve fix) = - cos x, evaluate the accuracy of 
the solution. 


Solution We approximate fix) by the polynomial Psix) that intersects all the data 
points, and then evaluate f* 5 f[x)dx ~ f^ 5 P 5 [x)dx with the Gauss-Legendre formula. 
Since the polynomial is of degree five, only three nodes (n = 3) are required in the 
quadrature. 

From Eq. (6.28) and Table 6.3, we obtain for the abscissas of the nodes 
3+1.5 3-1.5 

X\ = —-— H - -— (-0.774597) = 1.6691 


3+1.5 3-1.5 

x 3 = —-— + —-—(0.774597) = 2. 8309 

We now compute the values of the interpolant Psix) at the nodes. This can be done 
using the functions newtonPoly or neville listed in Art. 3.2. The results are 

P 5 (xi) = 0.098 08 P 5 ix 2 ) = 0.628 16 P 5 (x 3 ) = 0.952 16 


Using Gauss-Legendre quadrature 

I = f Psix)dx 
J 1.5 


3- 1.5 
2 


3 


J^APsiXi) 

i=l 
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we get 

/ = 0.75 [0.555 556(0.098 08) + 0.888 889(0.628 16) + 0.555 556(0.952 16)] 
= 0.85637 


Comparison with — f^ 5 cos xdx = 0.856 38 shows that the discrepancy is within the 
roundoff error. 


PROBLEM SET 6.2 


1. Evaluate 


f, 


•Tt 


In x 



with Gauss-Legendre quadrature. Use (a) two nodes and (b) four nodes. 

2. Use Gauss-Laguerre quadrature to evaluate / 0 °°(1 — x 2 ) 3 e~ x dx. 

3. Use Gauss-Chebyshev quadrature with six nodes to evaluate 



Compare the result with the “exact” value 2.62206. Hint: substitute sinx = l 2 . 

4. The integral sin x dx is evaluated with Gauss-Legendre quadrature using four 
nodes. What are the bounds on the truncation error resulting from the quadrature? 

5. How many nodes are required in Gauss-Laguerre quadrature to evaluate 
/ 0 °° e~ x sin x dx to six decimal places? 

6. Evaluate as accurately as possible 



Hint: substitute x = (1 + t) /2. 

7. Compute sin x In xdx to four decimal places. 

8. Calculate the bounds on the truncation error if x sin xdx is evaluated with 
Gauss-Legendre quadrature using three nodes. What is the actual error? 

9. Evaluate / Q 2 (sinhx/x) dx to four decimal places. 

10. Evaluate the integral 


/ 


00 xdx 
e x + l 


to six decimal places. Hint: substitute e x = l/t. 
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11. ■ The equation of an ellipse is x 2 /a 2 + y 2 /b 2 = 1. Write a program that computes 
the length 


S = 



V1 + ( dy/dx ) 2 dx 


of the circumference to four decimal places for given a and b. Test the program 
with a = 2 and b= 1. 

12. ■ The error function, which is of importance in statistics, is defined as 


erf(jc) 




e ‘ 2 dt 


Write a program that uses Gauss-Legendre quadrature to evaluate erf(jc) for a 
given x to six decimal places. Note that erf(x) = 1.000000 (correct to 6 decimal 
places) when x > 5. Test the program by verifying that erf(l.O) = 0.842 701. 



The sliding weight of mass m is attached to a spring of stiffness k that has an 
undeformed length L. When the mass is released from rest at B, the time it takes 
to reach A can be shown to be t = CV m/ k, where 


C = 




- 1/2 


dz 


Compute C to six decimal places. Hint: the integrand has a singularity at z = 1 
that behaves as (1 — z 2 ) -1/2 . 
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A uniform beam forms the semiparabolic cantilever arch AB. The vertical dis¬ 
placement of A due to the force P can be shown to be 



where El is the bending rigidity of the beam and 



Write a program that computes C{h/b ) for any given value of h/b to four decimal 
places. Use the program to compute C(0.5), C(1.0) and C(2.0). 

15. ■ There is no elegant way to compute/ = / Q jr/2 ln(sinx) dx. A “brute force” method 
that works is to split the integral into several parts: from x = 0 to 0.01, from 0.01 
to 0.2 and from x = 0.2 to tz/2. In the first part we can use the approximation 
sin x ~ x, which allows us to obtain the integral analytically. The other two parts 
can be evaluated with Gauss-Legendre quadrature. Use this method to evaluate 
/ to six decimal places. 

16. ■ 



The pressure of wind was measured at various heights on a vertical wall, as shown 
on the diagram. Find the height of the pressure center, which is defined as 

/ 0 " 2m h p(h) dh 

/ 0 112 ra PW dh 

Hint: fit a cubic polynomial to the data and then apply Gauss-Legendre 
quadrature. 


*6.5 Multiple Integrals 

Multiple integrals, such as the area integral f f A fix, y ) dx dy, can also be evaluated 
by quadrature. The computations are straightforward if the region of integration has a 
simple geometric shape, such as a triangle or a quadrilateral. Due to complications in 
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specifying the limits of integration on x and y, quadrature is not a practical means of 
evaluating integrals over irregular regions. However, an irregular region A can always 
be approximated as an assembly of triangular or quadrilateral subregions A\, A 2 ,. 
called finite elements, as illustrated in Fig. 6.6. The integral over /lean then be evaluated 
by summing the integrals over the finite elements: 

/ jj&'tfdxdy** ?IL fix , y) dx dy 

Volume integrals can computed in a similar manner, using tetrahedra or rectangular 
prisms for the finite elements. 



Boundary of region A 


Figure 6.6. Finite element model of an irregular 
region. 


Gauss-Legendre Quadrature over a Quadrilateral Element 



Figure 6.7. Mapping a quadrilateral into the standard rectangle. 

Consider the double integral 

I = J J /(|, rj)df dti 

over the rectangular element shown in Fig. 6.7(a). Evaluating each integral in 
turn by Gauss-Legendre quadrature using n nodes in each coordinate direction, 
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we obtain 


or 


/ I n n 

dr l = A i 

1 i=l j =1 


. 7=1 


n n 


1=1 7=1 


(6.40) 


The number of integration points n in each coordinate direction is called the in¬ 
tegration order. Figure 6.7(a) shows the locations of the integration points used in 
third-order integration [n = 3). Because the integration limits were the “standard” 
limits (—1,1) of Gauss-Legendre quadrature, the weights and the coordinates of the 
integration points are as listed Table 6.3. 

In order to apply quadrature to the quadrilateral element in Fig. 6.7(b), we must 
first map the quadrilateral into the “standard” rectangle in Fig. 6.7(a). By mapping 
we mean a coordinate transformation x = jc(£, 17 ), y = y(f, 17 ) that results in one-to- 
one correspondence between points in the quadrilateral and in the rectangle. The 
transformation that does the job is 



(6.41) 


where (jc*, yf> are the coordinates of corner k of the quadrilateral and 

MS, 17 ) = ^(l-f)(l-»7) 

NzG.ti) = J(1 + ?)(1 — * 7 ) (6.42) 

m$,i 7 ) = *d + §)(i + i7) 

ms,> 7 ) = J(i-?)d + i 7 ) 

The functions JVjt(§, rf, known as the shape functions, are bilinear (linear in each 
coordinate). Consequently, straight lines remain straight upon mapping. In particular, 
note that the sides of the quadrilateral are mapped into the lines | = ±1 and >] = ± 1 . 

Because mapping distorts areas, an infinitesimal area element dA = dx dy of the 
quadrilateral is not equal to its counterpart d% dq of the rectangle. It can be shown 
that the relationship between the areas is 


dxdy = |/(£, ( 7 )! d% dq 


( 6 . 43 ) 
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where 


/(£. ri) 


dX 

9y~ 

9| 

9? 

dX 

ay 

-dr/ 

9*7- 


(6.44a) 


is known as the Jacobian matrix of the mapping. Substituting from Eqs. (6.41) 
and (6.42) and differentiating, we find that the components of the Jacobian matrix 
are 


1 

/ li — — [ — (1 — >l)x i + (1 — rj)x2 + (1 + i))x 3 — (1 + 77) JC4] 

1 

/12 = — [—Cl — rj)yi + (1 - ri)y 2 + (1 + ri)y 3 - (1 + 17) y 4 ] 

J21 = \ [-(1 - |)Xi - (1 + $)x 2 + (1 + tnx 3 + (1 - £)**] 

J22 = — [—(l — ?)yi — (i + t)y2 + (i + ?)y3 + (i — t)y 4 ] 

We can now write 


J j /(x, y) dxdy = J J f [x{$, rf), y{£, r))} \J£, t))\ d% dr) 


(6.44b) 


(6.45) 


Since the right-hand side integral is taken over the “standard” rectangle, it can be 
evaluated using Eq. (6.40). Replacing /(£, 77 ) inEq. (6.40) by the integrand in Eq. (6.45), 
we get the following formula for Gauss-Legendre quadrature over a quadrilateral 
region: 


/ = EE^ )f[x{$„> 7j),y(?n»7,)] \m,T)])\ (6.46) 

i=i 7=1 

The § and r) coordinates of the integration points and the weights can again be obtained 
from Table 6.3. 

■ gaussQuad2 


The function gaussQuad2 computes / f A f{x, y) dxdy over a quadrilateral element 
with Gauss-Legendre quadrature of integration order n. The quadrilateral is de¬ 
fined by the arrays x and y, which contain the coordinates of the four corners 
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ordered in a counterclockwise direction around the element. The determinant of 
the Jacobian matrix is obtained by calling detJ; mapping is performed by map. 
The weights and the values of £ and >] at the integration points are computed by 
gaussNodes listed in the previous article (note that f and r/ appear as s and t in 
listing). 

function I = gaussQuad2(func,x,y,n) 

% Gauss-Legendre quadrature over a quadrilateral. 

% USAGE: I = gaussQuad2(func,x,y,n) 

% INPUT: 

% func = handle of function to be integrated. 

% x = [xl;x2;x3;x4] = x-coordinates of corners. 

% y = [yl;y2;y3;y4] = y-coordinates of corners. 

% n = order of integration 

% OUTPUT: 

% I = integral 

[t,A] = gaussNodes(n); 1=0; 
for i = l:n 

for j = 1:n 

[xNode,yNode] = map(x,y,t(i),t(j)); 
z = feval(func,xNode,yNode); 
detJ = j ac(x,y,t(i),t(j)); 

1=1+ A(i)*A(j)*detJ*z; 

end 

end 

function detJ = jac(x,y,s,t) 

% Computes determinant of Jacobian matrix. 

J = zeros(2); 

J(l,l) = - (1 - t)*x(l) + (1 - t)*x( 2)... 

+ (1 + t)*x(3) - (1 + t)*x(4); 

J(l,2) = - (1 - t)*y(l) + (1 - t)*y(2)... 

+ (1 + t)*y(3) - (1 + t)*y(4); 

J(2,1) = - (1 - s)*x(l) - (1 + s)*x(2)... 

+ (1 + s)*x(3) + (1 - s)*x(4); 

J(2,2) = - (1 - s)*y(l) - (1 + s)*y(2)... 

+ (1 + s)*y(3) + (1 - s)*y(4); 

detJ = (J(1,1)*J(2,2) - J(1,2)*J(2,1))/16; 
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function [xNode,yNode] = map(x,y,s,t) 

% Computes x and y-coordinates of nodes. 


N = 

zeros(4,1); 


N(l) 

= (1 

- s)*(l - 

t)/4 

N(2) 

= (1 

+ s)*(l - 

t)/4 

N(3) 

= (1 

+ s)*(l + 

t)/4 

N(4) 

= (1 

- s)*(l + 

t)/4 


xNode = dot(N,x); yNode = dot(N,y); 


EXAMPLE 6.13 



Evaluate the integral 


I = 


J J (x 2 + y ) dx dy 


over the quadrilateral shown. 

Solution The corner coordinates of the quadrilateral are 

x T = [o 2 2 o] y r = [o 0 3 2] 

The mapping is 


= J2 N ^,ri)x k 
fc= 1 

(1+ £1(1-77) (1+ |Kl + n) 

= 0 + --—-- ( 2 ) + --—-- ( 2 ) + 0 

4 4 

= 1 + 1 
4 

I'M = Y / m$,r 1 )y k 

fc= 1 

(1+11(1 + 77) (1- £1(1 + 7?) 

=0+0+- --— (3) + --—-—(2) 


(5 + 11(1 + 7?) 


4 
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which yields for the Jacobian matrix 

/(£,»?) = 

Thus the area scale factor is 


dX 

dy-\ 


1 

l + Tll 

3 | 

9 ? 


4 

dx 

ay 


0 

5 + ? 

-dr/ 

drt. 


4 J 


I/O?, > 7)1 = 


5 + ? 


Now we can map the integral from the quadrilateral to the standard rectangle. Refer¬ 
ring to Eq. (6.45), we obtain 

1-1 r 1 r tr, , HII I 5 + 5 


I = 


£/- 

/:/; 


, 1 + | ,, + (5 + |)q + , ) 


4 


-dt; dri 


fl >45 21 29 , 1 , 25 5 1 , . , , 

+ 77 ? + + 7 ? + >7 + ^£>7 + 7 +? V dt; drj 


16 8 


16 


16 


8 


16 


Noting that only even powers of f and ij contribute to the integral, we can simplify the 
integral to 

*1 /•! 


/ = 


r f 1 /45 29 , , 41 


EXAMPLE 6.14 

Evaluate the integral 


f 1 f 1 jzx xy j j 

/ / cos — cos — dx dy 

J- i J- i 2 2 

by Gauss-Legendre quadrature of order three. 

Solution From the quadrature formula in Eq. (6.40), we have 


> = EEAA ( co S ^cos^i 

1=1 7=1 



The integration points are shown in the figure; their coordinates and the correspond¬ 
ing weights are listed in Table 6.3. Note that the integrand, the integration points and 
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the weights are all symmetric about the coordinate axes. It follows that the points 
labeled a contribute equal amounts to /; the same is true for the points labeled b. 
Therefore, 


, , ;r (0.774 597) 

I = 4(0.555 556) 2 cos 2 —---- 


jr (0.774 597) 

+ 4(0.555 556)(0.888 889) cos —-- cos 


? ? 7T (0) 

+ (0.888 889) 2 cos 2 


= 1.623 391 


jt(0) 

2 


The exact value of the integral is 16 /tt 2 ~ 1.621 139. 

EXAMPLE 6.15 



Utilize gaussQuad2 to evaluate I = J f A fix, y) dxdy over the quadrilateral shown, 
where 


fix, y) = {x- 2fiy- 2) 2 


Use enough integration points for an “exact” answer. 


Solution The required integration order is determined by the integrand in Eq. (6.45): 


I = 



/UM.yM] l/MI d^dr, 


(a) 


We note that | / (f , rj) \ , defined in Eqs. (6.44), is biquadratic. Since the specified fix, y) 
is also biquadratic, the integrand in Eq. (a) is a polynomial of degree 4 in both £ and 
rj . Thus third-order integration in = 3) is sufficient for an “exact” result. Here is the 
MATLAB command that performs the integration: 


» I = gaussQuad2(@fex6_15,[0;4;4;1],[0;1;4;3],3) 
I = 

11.3778 


» 
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The M-file that returns the function to be integrated is 

function z = fex6_15(x,y) 

% Function used in Example 6.15. 
z = C(x - 2)*(y - 2))"2; 

Quadrature over a Triangular Element 


© 



Figure 6.8. Quadrilateral with two coincident corners. 


A triangle may be viewed as a degenerate quadrilateral with two of its corners 
occupying the same location, as illustrated in Fig. 6 . 8 . Therefore, the integration for¬ 
mulas over a quadrilateral region can also be used for a triangular element. However, 
it is computationally advantageous to use integration formulas specially developed 
for triangles, which we present without derivation . 15 



Consider the triangular element in Fig. 6.9. Drawing straight lines from the point 
P in the triangle to each of the corners divides the triangle into three parts with areas 
Ai, A '2 and A 3 . The so-called area coordinates of P are defined as 

«,■=—, i= 1,2,3 (6.47) 

A 

where A is the area of the element. Since Ai + A 2 + A 3 = A, the area coordinates are 
related by 

<*i + ol 2 + <23 = 1 (6.48) 

Note that a, ranges from 0 (when P lies on the side opposite to corner i) to 1 (when P 
is at corner i). 


15 The triangle formulas are extensively used in the finite method analysis. See, for example, O.C. 
Zienkiewicz and R.L Taylor, The Finite Element Method, Vol. 1,4th ed., McGraw-Hill (1989). 
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A convenient formula of computing A from the corner coordinates (x,-, yd is 


A=i 

2 


1 1 1 

Xi x 2 x 3 

yi yz ys 

The area coordinates are mapped into the Cartesian coordinates by 

3 3 

x{ui,a 2 , a 3 ) = ^ajXj y(«i, a 2 , a 3 ) = ^a,y, 


Z = 1 


1=1 


The integration formula over the element is 


(6.49) 


(6.50) 



/ [x(a), y(a)] 


dA= Aj2w k f[x{ ak ),y(a k ) ] 
k 


(6.51) 


where a k represents the area coordinates of the integration point k, and W k are the 
weights. The locations of the integration points are shown in Fig. 6.10, and the corre¬ 
sponding values of a k and W k are listed in Table 6.7. The quadrature in Eq. (6.51) is 
exact if /(x, y) is a polynomial of the degree indicated. 



(a) Linear 



(b) Quadratic 



(c) Cubic 


Figure 6.10. Integration points of trian¬ 
gular elements. 


Degree of /(x, y) 

Point 

<x k 

w k 

(a) Linear 

a 

1/3, 1/3, 1/3 

1 

(b) Quadratic 

a 

1 /2,0, 1/2 

1/3 


b 

1 /2, 1/2, 0 

1/3 


c 

0 , 1/2, 1/2 

1/3 

(c) Cubic 

a 

1/3, 1/3, 1/3 

-27/48 


b 

1/5, 1/5, 3/5 

25/48 


c 

3/5.1/5, 1/5 

25/48 


d 

1/5, 3/5, 1/5 

25/48 


Table 6.7 


■ triangleQuad 

The function triangleQuad computes / J A /(x, y) dx dy over a triangular region us¬ 
ing the cubic formula—case (c) in Fig. 6.10. The triangle is defined by its corner co¬ 
ordinate arrays x and y, where the coordinates must be listed in a counterclockwise 
direction around the triangle. 
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function I = triangleQuad(func,x,y) 

% Cubic quadrature over a triangle. 

% USAGE: I = triangleQuad(func,x,y) 

% INPUT: 

% func = handle of function to be integrated. 
% x = [xl;x2;x3] x-coordinates of corners. 
% y = [yl;y2;y3] y-coordinates of corners. 
% OUTPUT: 

% 1 = integral 

alpha = [1/3 1/3 1/3; 1/5 1/5 3/5;... 

3/5 1/5 1/5; 1/5 3/5 1/5]; 

W= [-27/48; 25/48; 25/48; 25/48]; 
xNode = alpha*x; yNode = alpha*y; 

A = (x(2)*y(3) - x(3)*y(2)... 

- x(l)*y(3) + x(3)*y(l)... 

+ x(l)*y(2) - x(2)*y(l))/2; 
sum = 0; 
for i = 1:4 

z = feval(func,xNode(i),yNode(i)); 
sum = sum + W(i)*z; 

end 

I = A*sum 

EXAMPLE 6.16 



Evaluate I = J J A fix, y) dxdy over the equilateral triangle shown, where 16 

fix, y) = \ ix 2 +/)~\ (x 3 - 3 xy 2 ) - \ 

Zb d 

Use the quadrature formulas for (1) a quadrilateral and (2) a triangle. 

16 This function is identical to the Prandtl stress function for torsion of a bar with the cross section 
shown; the integral is related to the torsional stiffness of the bar. See, for example, S.P. Timoshenko 
andJ.N. Goodier, Theory of Elasticity, 3rded., McGraw-Hill (1970). 
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Solution of Part (1) Let the triangle be formed by collapsing corners 3 and 4 of a 
quadrilateral. The corner coordinates of this quadrilateral are x= [—1, —1, 2, 2] r and 
y = 1^/3, — y/3. 0, 0 ] T . To determine the minimum required integration order for an 
exact result, we must examine / [x(£, //). y(£, t])\ \J (£, t])\, the integrand in Eq. (6.45). 
Since \J (§, rj) | is biquadratic, and fix, y) is cubic in x, the integrand is a polynomial of 
degree 5 in x. Therefore, third-order integration will suffice. The command used for 
the computations is similar to the one in Example 6.15: 


» I = gaussQuad2 (@fex6_16,[-1;-1; 2 ; 2 ] , . . . 
[sqrt(3);-sqrt(3);0;0],3) 

I = 

-1.5588 


The function that returns z = fix. y) is 


function z = fex6_16(x,y) 

% Function used in Example 6.16 
z = (x~2 + y~2)/2 - (x~3 - 3*x*y~2)/6 - 2/3; 


Solution of Part (2) The following command executes quadrature over the triangular 
element: 


» I = triangleQuad(@fex6_16,[-1; -1; 2],[sqrt(3);-sqrt(3); 0]) 
I = 

-1.5588 


Since the integrand is a cubic, this result is also exact. 

Note that only four function evaluations were required when using the tri¬ 
angle formulas. In contrast, the function had to be evaluated at nine points in 
Part (1). 


EXAMPLE 6.17 

The corner coordinates of a triangle are (0,0), (16,10) and (12,20). Compute 
/ f A (x 2 — y 2 ) dx dy over this triangle. 




247 


6.5 Multiple Integrals 


Solution 



Because /( x, y) is quadratic, quadrature over the three integration points shown 
in Fig. 6.10(b) will be sufficient for an “exact” result. Note that the integration points 
lie in the middle of each side; their coordinates are (6,10), (8, 5) and (14,15). The area 
of the triangle is obtained from Eq. (6.49): 


1 

A= - 
2 


1 1 1 

Xi x 2 x 3 

yi yi ys 


l l l 
0 16 12 
0 10 20 


= 100 


From Eq. (6.51) we get 

C 

I = A J2 W k f{x k , y fc ) 


k=a 

= 100 

= — [(6 1 2 - 10 2 ) + (8 2 - 5 2 ) + (14 2 - 15 2 )] = 1800 
3 


|/(6, 10)+ */(8,5)+ */(14,15) 


PROBLEM SET 6.3 


1. Use Gauss-Legendre quadrature to compute 

a: (1 - x 2 )(l - y 2 ) dxdy 

2. Evaluate the following integral with Gauss-Legendre quadrature: 


f f 

J y =0 J x= 


x 2 y 2 dxdy 


Jy =0 Jx=0 

3. Compute the approximate value of 

f* 1 /»! 


/.£■ 


1 dxdy 


with Gauss-Legendre quadrature. Use integration order (a) two and (b) three. 
(The true value of the integral is 2.230 985.) 
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4. Use third-order Gauss-Legendre quadrature to obtain an approximate value of 


rl tv (x y) 

cos- dx ay 


(The exact value of the integral is 1.621 139.) 



Map the integral f f A xy dx dy from the quadrilateral region shown to the “stan¬ 
dard” rectangle and then evaluate it analytically. 



Compute f f A xdx dy over the quadrilateral region shown by first mapping it into 
the “standard” rectangle and then integrating analytically. 


y 

4 
2 

Use quadrature to compute / f A x 2 dx dy over the triangle shown. 

8. Evaluate / f A x 3 dx dy over the triangle shown in Prob. 7. 

9. 




Evaluate / J A (3 — x)y dx dy over the region shown. 
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10. Evaluate f f A x 2 y dx dy over the triangle shown in Prob. 9. 

11 . ■ 



Evaluate f f A xy{2 — x 2 ) (2 — xy) dx dy over the region shown. 

12. ■ Compute f f A xy exp(— x 2 ) dx dy over the region shown in Prob. 11 to four dec¬ 
imal places. 



Evaluate / f A (1 - x) (y — x) y dx dy over the triangle shown. 

14. ■ Estimate / f A smjrxdxdy over the region shown in Prob. 13. Use the cubic 
integration formula for a triangle. (The exact integral is l/7r.) 

15. ■ Compute / J A sin nx sin n (y — x) dxdy to six decimal places, where A is the 
triangular region shown in Prob. 13. Consider the triangle as a degenerate 
quadrilateral. 

16. ■ 



Write a program to evaluate / f A f{x, y) dx dy over an irregular region that has 
been divided into several triangular elements. Use the program to compute 
/ Ia x y^y ~ dy over the region shown. 
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MATLAB Functions 

I = quad(func, a,b, tol) uses adaptive Simpson’s rule for evaluating 1 = 
fa ,/W dx wi Lh an error tolerance tol (default is 1.0e-6). To speed up execution, 
vectorize the computation of func by using array operators .*, ./ and in the 
definition of func. For example, if f{x) = x 3 sin x + 1 /x, specify the function as 

function y = func(x) 
y = (x.~3).*sin(x) + l./x 

I = dblquadffunc , xMin, xMax, yMin, yMax, tol) uses quad to integrate over a 
rectangle: 

nyMax r*xMax 

1= / f{x, y) dxdy 

J yMin J xMin 

I = quadl ( func , a, b, tol ) employs adaptive Lobatto quadrature (this method is 
not discussed in this book). It is recommended if very high accuracy is desired 
and the integrand is smooth. 

There are no functions for Gaussian quadrature. 




Initial Value Problems 


Solve y' = F(jc, y), y (a) = a 


7.1 Introduction 

The general form of a first-order differential equation is 

/ = .fix, y) (7.1a) 

where y' = dy/dxand f{x, y) is a given function. The solution of this equation contains 
an arbitrary constant (the constant of integration). To find this constant, we must know 
a point on the solution curve; that is, y must be specified at some value of x, say at 


x = a. We write this auxiliary condition as 

y[a) = a (7.1b) 

An ordinary differential equation of order n 

/ n) = f{x,y,y',...,/ n ~ 1) ) (7.2) 

can always be transformed into n first-order equations. Using the notation 

yi = y y 2 = y' y 3 = y" ... y n = y (n ~ v (7.3) 

the equivalent first-order equations are 

M = y 2 y 2 = y3 y 3 = y4 y' n = /(a yi,y 2 ,...,y») (7.4a) 


The solution now requires the knowledge n auxiliary conditions. If these conditions 
are specified at the same value of x, the problem is said to be an initial value problem. 
Then the auxiliary conditions, called initial conditions, have the form 

yi(fl) = oil y 2 (a ) = a 2 y?,{a) = a 3 ... y n {a ) = a„ (7.4b) 
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If y,- are specified at different values of x, the problem is called a boundary value 
problem. 

For example, 

y" = -y y(0) = 1 y'(0) = o 

is an initial value problem since both auxiliary conditions imposed on the solution 
are given at x = 0. On the other hand, 

y" = -y y(0) = 1 yw = o 

is a boundary value problem because the two conditions are specified at different 
values of x. 

In this chapter we consider only initial value problems. The more difficult bound¬ 
ary value problems are discussed in the next chapter. We also make extensive use of 
vector notation, which allows us manipulate sets of first-order equations in a concise 
form. For example, Eqs. (7.4) are written as 


where 


y' = F(x, y) y{a) = a 


F(x,y) 


T2 

T3 


yn 


(7.5a) 


(7.5b) 


A numerical solution of differential equations is essentially a table of x- and y-values 
listed at discrete intervals of x. 


7.2 Taylor Series Method 

The Taylor series method is conceptually simple and capable of high accuracy. Its 
basis is the truncated Taylor series for y about x: 

y(x + h) « y(x) + y'(x)h + ^y "{x)h 2 + ^y "'(x)h 3 -f- F -^y (m) W h m (7.6) 

Because Eq. (7.6) predicts y at x + h from the information available at x, it is also a 
formula for numerical integration. The last term kept in the series determines the 
order of integration. For the series in Eq. (7.6) the integration order is m. 

The truncation error, due to the terms omitted from the series, is 

E=--i-_y (mfUdl/jW- 1 , x<%<X + h 
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Using the finite difference approximation 

v (^n fn „ y m ix+h) -y (m) M 
y h 

we obtain the more usable form 

h m 

E « --— [y (m) ix + h)- y (m) (x)] (7.7) 

(m+ 1)! L J 

which could be incorporated in the algorithm to monitor the error in each integration 
step. 

■ taylor 

The function t aylor implements the Taylor series method of integration of order four. 

It can handle any number of first-order differential equations y{ = f{x, y \, . y n ), 

i = 1,2,..., n. The user is required to supply the function deriv that returns the 4 x n 
array 


' (y') T " 



y'i • 

•• y'n ' 

(y") r 


y'i 

y'i • 

■■ y'n 

(y"') r 


y"' 

y'2 • 

■■ y'n 

(y (4) ) r 


U 4) 

yf • 

-1 

% 


The function returns the arrays xSol and ySol that contain the values of x and y 
at intervals h. 

function [xSol,ySol] = taylor(deriv,x,y,xStop,h) 

% 4th-order Taylor series method of integration. 

% USAGE: [xSol,ySol] = taylor(deriv,x,y,xStop,h) 

% INPUT: 

% deriv = handle of function that returns the matrix 
% d = [dy/dx d"2y/dx'2 d~3y/dx'3 d"4y/dx"4]. 

% x,y = initial values; y must be a row vector. 

% xStop = terminal value of x 

% h = increment of x used in integration (h > 0). 

% OUTPUT: 

% xSol = x-values at which solution is computed. 

% ySol = values of y corresponding to the x-values. 

if size(y,l) >1; y = y’; end % y must be row vector 
xSol = zeros(2,l); ySol = zeros(2,length(y)); 
xSol(l) = x; ySol(l,:) = y; 
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k = 1; 


while x < xStop 

h = min(h,xStop - x); 
d = feval(deriv,x,y); 
hh = 1; 
for j = 1:4 


% Derivatives of [y] 


% Build Taylor series 
% hh = h-j/j! 


hh = hh*h/j; 


y = y + d(j,:)*hh; 

end 

x = x + h; k = k + 1; 

xSol(k) = x; ySol(k,:) = y; % Store current soln. 

end 

■ printSol 

This function prints the results xSol and ySol in tabular form. The amount of data 
is controlled by the printout frequency freq. For example, if freq = 5, every fifth 
integration step would be displayed. If f r e q = 0, only the initial and final values will 
be shown. 

function printSol(xSol,ySol,freq) 

% Prints xSol and ySoln arrays in tabular format. 

% USAGE: printSol(xSol,ySol,freq) 

% freq = printout frequency (prints every freq-th 
% line of xSol and ySol). 

[m,n] = size(ySol); 
if freq == 0;freq = m; end 
head = ’ x’; 

for i = l:n 

head = strcat(head, ’ y ’ , num2str(i)) ; 

end 

fprintf(head) ; fprintf(’\n’) 
for i = l:freq:m 

fprintf(’%14.4e’,xSol(i),ySol(i,:)); fprintf(’\n’) 

end 

if i "= m; fprintf(’%14.4exSol(m),ySol(m,:)); end 


EXAMPLE 7.1 

Given that 


y' + 4y = x 2 y(0) = 1 
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determine y(0.2) with the fourth-order Taylor series method using a single integration 
step. Also compute the estimated error from Eq. (7.7) and compare it with the actual 
error. The analytical solution of the differential equation is 


31 

y = —e 
J 32 


—4x 


1 o 1 1 

+ -jr- x-\ - 

4 8 32 


Solution The Taylor series up to and including the term with h A is 


y m = y (0) + y> ' (0) h + ^ y" (0) h 2 + 1 y'" (0) h 3 + i y (4) (0) h A (a) 

Differentiation of the differential equation yields 
y' = -4 y + x 2 

y" = -4 y' + 2x= 16y - 4 jc 2 + 2jc 

y'" = 16/ - 8 x + 2 = —64y + 16x 2 - 8x + 2 

y lA) = -64y' + 32 jc - 8 = 256y - 64 jc 2 + 32x - 8 


Thus 


/(0) = -4(1) = -4 
y"(0) = 16(1) = 16 
y"'{ 0) = -64(1) + 2 = -62 
y (4) (0) = 256(1) - 8 = 248 
With h = 0.2 Eq. (a) becomes 

y(0.2) = 1 + (-4) (0.2) + ^(16)(0.2) 2 + l(-62)(0.2) 3 + ^(248)(0.2) 4 
= 0.4539 


According to Eq. (7.7) the approximate truncation error is 

E = -^ [y (4) (°-2) — / 4)( °)] 

where 


y (4) (0) = 248 

y (4, (0.2) = 256(0.4539) - 64(0.2) 2 + 32(0.2) - 8 = 112.04 


Therefore, 


E = 


( 0 . 2) 4 


(112.04-248) = -0.0018 


5 ! 
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The analytical solution yields 

y(0.2) = ^e- 4(0 ' 2) + ^(0.2) 2 - 4(0-2) + 1 = 0-4515 

oZ 4 o oZ 

so that the actual error is 0.4515 — 0.4539 = —0.0024. 


EXAMPLE 7.2 

Solve 


y" = -0.1/ - x y(0) = 0 y'(0) = 1 

from x = 0 to 2 with the Taylor series method of order four using h = 0.25. 

Solution Withyi = yandy 2 = y'the equivalent first-order equations and initial con¬ 
ditions are 


y = 


y[ 

T2 


T2 

—0.1y 2 - x 


y(0) = 


Repeated differentiation of the differential equations yields 


0 

1 


^2 


—0.1y 2 -x 

1 

1 

o 

H 

X 

1 

H- 1 

1_ 


0.01y 2 + O.lx — 1 


O.ly'-l " 


0.01y 2 + O.lx — 1 

O.Oly' + O.l 


-0.001y 2 -O.Olx + O.l 


—0.001y 2 — 0.01x + 0.1 
0.0001y 2 + O.OOlx — 0.01 

Thus the derivative array required by taylor is 


y (4) = 


0.01y^ + 0.1 
-O.OOly^ - 0.01 


T2 

—0.1y 2 - x 
0.01y 2 + O.lx — 1 
—0.001y 2 — 0.01X+ 0.1 


—0.1y 2 - x 
0.01y 2 + O.lx — 1 
—0.001y 2 - O.Olx + O.l 
0.0001y 2 + O.OOlx — 0.01 


which is computed by 


function d = fex7_2(x,y) 

% Derivatives used in Example 7.2 


d = zeros(4,2) ; 

d(l,l) = y(2); 

d(l,2) = -0.l*y(2) - x; 

d(2,1) = d(l,2) ; 

d(2,2) = 0.01*y(2) + 0.1*x -1; 
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d(3,1) = d(2,2) ; 

d(3 ,2) = -0.001*y(2) - 0.01*x + 0.1; 
d(4,1) = d(3,2) ; 

d(4 ,2) = 0.0001*y(2) + 0.001*x - 0.01; 
Here is the solution: 


[x,y] = taylor(@fex7_2, 0, 

printSol(x,y,1) 

i—i 

o 

H 

N) 

O 

X 

yi 

y2 

0.0000e+000 

0.0000e+000 

1.OOOOe+OOO 

2.5000e-001 

2.4431e-001 

9.4432e-001 

5.0000e-001 

4.6713e-001 

8.2829e-001 

7.5000e-001 

6.5355e-001 

6.5339e-001 

1.0000e+000 

7.8904e-001 

4.2110e-001 

1.2500e+000 

8.5943e-001 

1.3281e-001 

1.5000e+000 

8.5090e-001 

-2.1009e-001 

1.7500e+000 

7.4995e-001 

-6.0625e-001 

2.0000e+000 

5.4345e-001 

-1.0543e+000 


The analytical solution of the problem is 

y = lOOx - 5x 2 + 990(e“ 0 lx - 1) 

from which we obtain y(2) = 0.543 45 and y'( 2) = — 1.0543, which agree with the nu¬ 
merical solution. 

The main drawback of the Taylor series method is that it requires repeated differ¬ 
entiation of the dependent variables. These expressions may become very long and 
thus error-prone and tedious to compute. Moreover, there is the extra work of coding 
each of the derivatives. 


7.3 Runge-Kutta Methods 

The aim of Runge-Kutta methods is to eliminate the need for repeated differentiation 
of the differential equations. Since no such differentiation is involved in the first-order 
Taylor series integration formula 

y(x + h) = y(x) + y \x)h = y(x) + F(x, y )h (7.8) 

it can be considered as the first-order Runge-Kutta method; it is also called Euler’s 
method. Due to excessive truncation error, this method is rarely used in practice. 




Initial Value Problems 


y'(x) 



Error 


Euler's formula 


Figure 7.1. Graphical representation of Euler’s formula. 


f{x,y) 


X 


X 


x+ h 


Let us now take a look at the graphical interpretation of Euler’s formula. For the 
sake of simplicity, we assume that there is a single dependent variable y, so that the dif¬ 
ferential equation is y' = fix. y). The change in the solution y between x and x + li is 


• x+h px+h 


y(x + h) — y{h) = y'dx = fix, y)dx 


X J X 


which is the area of the panel under the y'(x) plot, shown in Fig. 7.1. Euler’s formula 


approximates this area by the area of the cross-hatched rectangle. The area between 
the rectangle and the plot represents the truncation error. Clearly, the truncation 
error is proportional to the slope of the plot; that is, proportional to y"(x). 

Second-Order Runge-Kutta Method 

To arrive at the second-order method, we assume an integration formula of the form 


y(x + h) = y(x) + c 0 F(x, y )h + CiF [x + ph, y + qh¥{x, y)] h 


(a) 


and attempt to find the parameters c 0 , Ci, p and q by matching Eq. (a) to the Taylor 
series 



Noting that 



where n is the number of first-order equations, we can write Eq. (b) as 



(c) 


Returning to Eq. (a), we can rewrite the last term by applying a Taylor series in 
several variables: 
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so that Eq. (a) becomes 


y(x + h) = y(x) + (c 0 + Ci) F(x, y)h + c x 


9 ^ 

- p /.+ ,),g-F 1 (*,y) 


h+0{h 3 ) (d) 


Comparing Eqs. (c) and (d), we find that they are identical if 

, 1 1 

Co + Cl = 1 Cl p = - c x q = - 


(e) 


Because Eqs. (e) represent three equations in four unknown parameters, we can assign 
any value to one of the parameters. Some of the popular choices and the names 
associated with the resulting formulas are: 


Co = 

0 

Cl 

= 1 

P = 

1/2 

q = 

1/2 

Modified Euler’s method 

Co = 

1/2 

Cl 

= 1/2 

P = 

1 

q = 

1 

Heun’s method 

Co = 

1/3 

Cl 

= 2/3 

P = 

3/4 

q = 

3/4 

Ralston’s method 


All these formulas are classified as second-order Runge-Kutta methods, with no for¬ 
mula having a numerical superiority over the others. Choosing the modified Euler’s 
method, we substitute the corresponding parameters into Eq. (a) to yield 


y(x + h) 


yW + F 


h h 

x+ -,y+ -F(x, y) 


(f) 


This integration formula can be conveniently evaluated by the following sequence of 
operations 


Ki = h¥{x. y) 

K 2 = hF\x+^,y+^KA (7.9) 

y{x + h) = y(x) + K 2 

Second-order methods are seldom used in computer application. Most program¬ 
mers prefer integration formulas of order four, which achieve a given accuracy with 
less computational effort. 



f(x+ hi 2, y+ Kf/2) 


Figure 7.2. Graphical representation of modified Euler 
formula. 


Figure 7.2 displays the graphical interpretation of modified Euler’s formula for a 
single differential equation y' = fix, y). The first of Eqs. (7.9) yields an estimate of 
y at the midpoint of the panel by Euler’s formula: y(x + h/2) = y(x) + f(x, y)h/2 = 
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y(x) + K\/2. The second equation then approximates the area of the panel by the area 
K 2 of the cross-hatched rectangle. The error here is proportional to the curvature y'" 
of the plot. 

Fourth-Order Runge-Kutta Method 

The fourth-order Runge-Kutta method is obtained from the Taylor series along the 
same lines as the second-order method. Since the derivation is rather long and not very 
instructive, we skip it. The final form of the integration formula again depends on the 
choice of the p arameters; that is, there is no unique Runge-Kutta fourth- order formula. 
The most popular version, which is known simply as the Runge-Kutta method, entails 
the following sequence of operations: 


Ki = hF(x, y) 




(7.10) 


K 4 = /iF(x + / 2 ,y+K 3 ) 

y(x +h)= y(x) + l (Ki + 2K 2 + 2K 3 + K 4 ) 
b 


The main drawback of this method is that it does not lend itself to an estimate of the 
truncation error. Therefore, we must guess the integration step size h, or determine 
it by trial and error. In contrast, the so-called adaptive methods can evaluate the 
truncation error in each integration step and adjust the value of h accordingly (but at 
a higher cost of computation). One such adaptive method is introduced in the next 
article. 

■ runKut4 

The function runKut4 implements the Runge-Kutta method of order four. The user 
must provide runKut4 with the function dEqs that defines the first-order differential 
equations y' = F(x, y). 

function [xSol.ySol] = runKut4(dEqs,x,y,xStop,h) 

% 4th-order Runge--Kutta integration. 

% USAGE: [xSol.ySol] = runKut4(dEqs,x,y,xStop,h) 

% INPUT: 

% dEqs = handle of function that specifies the 
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% 


% 


lst-order differential equations 
F(x,y) = [dyl/dx dy2/dx dy3/dx . . . ] . 


% x,y = initial values; y must be row vector. 
% xStop = terminal value of x. 

% h = increment of x used in integration. 


% OUTPUT: 

% xSol = x-values at which solution is computed. 

% ySol = values of y corresponding to the x-values. 

if size(y,l) >1 ; y = y'; end % y must be row vector 
xSol = zeros(2,l); ySol = zeros(2,length(y)); 
xSol(l) = x; ySol(l,:) = y; 

i = l; 

while x < xStop 
i = i + 1; 

h = min(h,xStop - x); 

K1 = h*fevalfdEqs,x,y); 

K2 = h*feval(dEqs,x + h/2,y + Kl/2); 

K3 = h*feval(dEqs,x + h/2,y + K2/2); 

K4 = h*fevalfdEqs,x+h,y + K3); 
y = y + (K1 + 2*K2 + 2*K3 + K4)/6; 
x = x + h; 

xSol(i) = x; ySol(i,:) = y; % Store current soln. 

end 

EXAMPLE 7.3 

Use the second-order Runge-Kutta method to integrate 

y' = sin y y(0) = 1 

from x = 0 to 0.5 in steps of h= 0.1. Keep four decimal places in the computations. 
Solution In this problem we have 

f{x,y) = siny 

so that the integration formulas in Eqs. (7.9) are 
K\ = hf{x , y) = 0.1 siny 



y{x + h) = y(x) + K 2 
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Noting that y(0) = 1, we may proceed with the integration as follows: 

Ki = 0.1 sin 1.0000 = 0.0841 

/ 0.0841X 

K 2 = 0.1 sin I 1.0000 + —-— j = 0.0863 
y(0.1) = 1.0 + 0.0863 = 1.0863 


K i = 0.1 sin 1.0863 = 0.0885 

/ 0.0885 \ 

K 2 = 0.1 sin ( 1.0863 + —-— j = 0.0905 
y(0.2) = 1.0863 + 0.0905 = 1.1768 

and so on. A summary of the computations is shown in the table below. 


X 

y 

K x 

k 2 

0.0 

1.0000 

0.0841 

0.0863 

0.1 

1.0863 

0.0885 

0.0905 

0.2 

1.1768 

0.0923 

0.0940 

0.3 

1.2708 

0.0955 

0.0968 

0.4 

1.3676 

0.0979 

0.0988 

0.5 

1.4664 




The exact solution can be shown to be 

jc(y) = ln(cscy - cot y) + 0.604582 

which yields x(1.4664) = 0.5000. Therefore, up to this point the numerical solution is 
accurate to four decimal places. However, it is unlikely that this precision would be 
maintained if we were to continue the integration. Since the errors (due to truncation 
and roundoff) tend to accumulate, longer integration ranges require better integration 
formulas and more significant figures in the computations. 

EXAMPLE 7.4 

Solve 


y" = -O.ly' - x y(0) = 0 y'(0) = 1 

from x = 0 to 2 in increments of h= 0.25 with the fourth-order Runge-Kutta method. 
(This problem was solved by the Taylor series method in Example 7.2.) 
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Solution Letting y\ = y and y 2 = y', we write the equivalent first-order equations as 


F(x,y) = y' 


V 


y 2 



- 1 

H 

1 

CM 

i-H 

O 

1 

_1 


which are coded in the following function: 


function F = fex7_4(x,y) 

% Differential, eqs. used in Example 7.4 
F = zeros(l,2); 

F(l) = y(2); F(2) = -0.l*y(2) - x; 

Comparing the function f ex7_4 here with fex7_2 in Example 7.2 we note that it 
is much simpler to input the differential equations for the Runge-Kutta method than 
for the Taylor series method. Here are the results of integration: 

» [x,y] = runKut4(@fex7_4,0,[0 1],2,0.25); 

» printSolfx,y,1) 


X 

yi 

y2 

0.0000e+000 

0.0000e+000 

1.0000e+000 

2.5000e-001 

2.4431e-001 

9.4432e-001 

5.0000e-001 

4.6713e-001 

8.2829e-001 

7.5000e-001 

6.5355e-001 

6.5339e-001 

1.0000e+000 

7.8904e-001 

4.2110e-001 

1.2500e+000 

8.5943e-001 

1.3281e-001 

1.5000e+000 

8.5090e-001 

-2.1009e-001 

1.7500e+000 

7.4995e-001 

-6.0625e-001 

2.0000e+000 

5.4345e-001 

-1.0543e+000 


These results are the same as obtained by the Taylor series method in Example 7.2. 
This was expected, since both methods are of the same order. 

EXAMPLE 7.5 

Use the fourth-order Runge-Kutta method to integrate 

y' = 3y - 4e~ x y(0) = 1 

from x = 0 to 10 in steps of h = 0.1. Compare the result with the analytical solution 

y = e~ x . 
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Solution The function specifying the differential equation is 


function F = fex7_5(x,y) 

% Differential eq. used in Example 7.5. 
F = 3*y - 4*exp(-x); 

The solution is (every 20th line was printed): 


» [x,y] = runKut4(@fex7_5,0,1,10,0.1); 
» printSolfx,y,20) 


x 

0.0000e+000 
2.0000e+000 
4.0000e+000 
6.0000e+000 
8.0000e+000 
1.0000e+001 


yi 

1.OOOOe+OOO 
1.32 50e-001 
-1.1237e+000 
-4.60 56e+002 
-1.8 5 75e+005 
- 7.4912e+00 7 


It is clear that something went wrong. According to the analytical solution, y 
should decrease to zero with increasing x, but the output shows the opposite trend: 
after an initial decrease, the magnitude of y increases dramatically. The explanation 
is found by taking a closer look at the analytical solution. The general solution of the 
given differential equation is 


y = Ce 3x + e~ x 

which can be verified by substitution. The initial condition y(0) = 1 yields C = 0, so 
that the solution to the problem is indeed y = e~ x . 

The cause of trouble in the numerical solution is the dormant term Ce 3x . Suppose 
that the initial condition contains a small error e, so that we have y(0) = 1 + e. This 
changes the analytical solution to 


y = se 3x + e x 

We now see that the term containing the error £ becomes dominant as x is increased. 
Since errors inherent in the numerical solution have the same effect as small changes in 
initial conditions, we conclude that our numerical solution is the victim of numerical 
instability due to sensitivity of the solution to initial conditions. The lesson here is: do 
not always trust the results of numerical integration. 
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EXAMPLE 7.6 



A spacecraft is launched at an altitude H = 772 km above sea level with the speed 
Vq = 6700 m/s in the direction shown. The differential equations describing the mo¬ 
tion of the spacecraft are 


.. h 2 GM e 
r = r6 -r— 


0 = — 


2 r9 


where r and 6 are the polar coordinates of the spacecraft. The constants involved in 
the motion are 


G = 6.672 x 10 -11 m 3 kg -1 s -2 = universal gravitational constant 
M e = 5.9742 x 10 24 kg = mass of the earth 
R e = 6378.14 km = radius of the earth at sea level 


(1) Derive the first-order differential equations and the initial conditions of the form 
y = F(4. y), y(0) = b. (2) Use the fourth-order Runge-Kutta method to integrate the 
equations from the time of launch until the spacecraft hits the earth. Determine 8 at 
the impact site. 

Solution of Part (1) We have 

GM e = (6.672 X KT 11 ) (5.9742 x 10 24 ) = 3.9860 x 10 14 m 3 s“ 2 


Letting 


"y i" 


r 

ya 


r 

ys 


0 

_y 4. 


_6 _ 


the equivalent first-order equations become 


"yi" 


yi 

y2 


y 0 y| - 3.9860 X 10 14 /y 0 2 



ya 

_y4_ 


-2yiy 3 /yo 
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with the initial conditions 

r(0) = R e + H = R e = (6378.14 + 772) x 10 3 = 7.15014 x 10 6 m 
r(0) = 0 
0(0) = 0 

0(0) = Vo/r( 0) = (6700) /(7.15014 x 10 6 ) = 0.937045 x 10“ 3 rad/s 
Therefore, 


y(0) 


7. 15014 x 10 6 
0 
0 

0.937045 x 10“ 3 


Solution of Part (2) The function that returns the differential equations is 


function F = fex7_6(x,y) 

% Differential eqs. used in Example 7.6. 

F = zeros(l,4); 

F(l) = y(2); 

F(2) = y(l)*y(4)“2 - 3.9860el4/y(1)~2; 

F(3) = y(4); 

F(4) = -2*y(2)*y(4)/y(l); 

The program used for numerical integration is listed below. Note that the inde¬ 
pendent variable t is denoted by x. 


% Example 7.6 (Runge-Kutta integration) 
x = 0; y = [7.15014e6 0 0 0.937045e-3]; 
xStop = 1200; h = 50; freq = 2; 

[xSol,ySol] = runKut4(@fex7_6,x,y,xStop,h); 
printSol(xSol,ySol,freq) 

Here is the output: 

y4 

9.3704e-004 
9.3904e-004 
9.4504e-004 
9.5515e-004 


» x 
0.0000e+000 
1.0000e+002 
2.0000e+002 
3.0000e+002 


yi 

7.1501e+006 
7.1426e+006 
7.1198e+006 
7.0820e+006 


y2 

0.0000e+000 
-1.5173e+002 
-3.02 76e+002 
-4.5236e+002 


y3 

0.0000e+000 
9.3771e-002 
1.8794e-001 
2.8292e-001 
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4.0000e+002 
5.0000e+002 
6.0000e+002 
7.0000e+002 
8.0000e+002 
9.0000e+002 
1.0000e+003 
1.1000e+003 
1.2000e+003 


7.0294e+006 
6.9622e+006 
6.8808e+006 
6.78 56e+006 
6.6773e+006 
6.5568e+006 
6.42 50e+006 
6.2831e+006 
6.1329e+006 


-5.9973e+002 
-7.4393e+002 
-8.8389e+002 
-1.0183e+003 
-1.1456e+003 
-1.2639e+003 
-1.3708e+003 
-1.4634e+003 
-1.5384e+003 


3.7911e-001 
4.7697e-001 
5.7693e-001 
6.79 50e-001 
7.8520e-001 
8.9459e-001 
1.0083e+000 
1.1269e+000 
1.2512e+000 


9.6951e-004 
9.8832e-004 
1.0118e-003 
1.0404e-003 
1.0 744e-003 
1.1143e-003 
1.1605e-003 
1.2135e-003 
1.2737e-003 


The spacecraft hits the earth when r equals R e = 6.378 14 x 10 6 m. This occurs 
between t = 1000 and 1100 s. Amore accurate value of tcanbeobtainedbypolynomial 
interpolation. If no great precision is needed, linear interpolation will do. Letting 
1000 + At be the time of impact, we can write 

r(1000+ At) = R e 

Expanding r in a two-term Taylor series, we get 

r(1000) + r(1000) Af = R e 

6.4250 x 10 6 + (-1.3708 x 10 1 * 3 ) At = 6378.14 x 10 3 

from which 


At = 34.184 s 


Thus the time of impact is 1034.2 s. 

The coordinate 0 of the impact site can be estimated in a similar manner. Using 
again two terms of the Taylor series, we have 

0(1000 + At) = 0(1000) + 0(lOOO)At 

= 1.0083 + (1.1605 x 10“ 3 ) (34.184) 

= 1.0480 rad = 60.00° 


PROBLEM SET 7.1 

1. Given 


y' + 4y = x 2 y{ 0) = 1 


compute y(0.1) using one step of the Taylor series method of order (a) two and 
(b) four. Compare the result with the analytical solution 


, , 31 _ 4jc 1 2 

y(x) = —e + -x 
y 32 4 


1 


1 


— —X ~h 

8 32 
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2. Solve Prob. 1 with one step of the Runge-Kutta method of order (a) two and (b) 
four. 

3. Integrate 

y' = siny y(0) = 1 

fromx = 0 to 0.5 with the second-order Taylor series method using h = 0.1. Com¬ 
pare the result with Example 7.3. 

4. Verify that the problem 

y' = y 1/3 y(0) = o 

has two solutions: y = 0 and y = (2 jc/ 3) 3/2 . Which of the solutions would be re¬ 
produced by numerical integration if the initial condition is set at (a) y = 0 and 

(b) y = 10 -16 ? Verify your conclusions by integrating with any numerical method. 

5. Convert the following differential equations into first-order equations of the form 
y' = F(x,y): 

(a) in y' + y = sin x 

(b) y"y - xy' -2y 2 = 0 

(c) y (4) - 4y"y / l - y 2 = 0 

(d) (y”) 2 = |32y'x — y 2 | 

6. In the following sets of coupled differential equations t is the independent vari¬ 
able. Convert these equations into first-order equations of the formy = F(f, y): 


(a) y = x — 2y x = y — x 

(b) y = -y (y 2 + x 2 ) 1/4 x =-x (y 2 + x) 1/4 - 32 

(c) y 2 + fsiny = 4x xx+tcosy = 4y 


7. ■ The differential equation for the motion of a simple pendulum is 


where 


d 2 e 
h¥ 


g 

L 


sin0 


6 = angular displacement from the vertical 
g = gravitational acceleration 
L = length of the pendulum 

With the transformation r = t^/g/L the equation becomes 


d 2 e 

d?=- sme 
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Use numerical integration to determine the period of the pendulum if the ampli¬ 
tude is d 0 = 1 rad. Note that for small amplitudes (sin 9 ~ 9 ) the period is 2ir^L/g. 

8. ■ A skydiver of mass m in a vertical free fall experiences an aerodynamic drag 
force h]) = Coy 2 , where y is measured downward from the start of the fall. The 
differential equation describing the fall is 


y = g- 



Determine the time of a 500 m fall. Use g = 9.80665 m/s 2 , Cd = 0.2028 kg/m and 
m = 80 kg. 



The spring-mass system is at rest when the force P[t ) is applied, where 


Pit) = 


lOf N when t < 2 s 
20 N when f > 2 s 


The differential equation for the ensuing motion is 

Pit) k 

y = - y 

m m 

Determine the maximum displacement of the mass. Use m= 2.5 kg and k = 
75 N/m. 



The conical float is free to slide on a vertical rod. When the float is disturbed 
from its equilibrium position, it undergoes oscillating motion described by the 
differential equation 


y=g(l-ay 3 ) 
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where a = 16 m -3 (determined by the density and dimensions of the float) and 
g = 9.80665 m/s 2 . If the float is raised to the position y = 0.1 m and released, 
determine the period and the amplitude of the oscillations. 



The pendulum is suspended from a sliding collar. The system is at rest when the 
oscillating motion y[t) = Y sin cot is imposed on the collar, starting at t = 0. The 
differential equation describing the motion of the pendulum is 

g . CD 2 

9 = — j- sm 9 + — Y cos 9 sin cot 

Plot 9 vs. t from t = 0 Lo 10 s and determine the largest 9 during this period. Use 
g = 9.80665 m/s 2 , I = 1.0m,f= 0.25 m and w = 2.5 rad/s. 



The system consisting of a sliding mass and a guide rod is at rest with the mass 
at r = 0.75 m. At time t= 0 a motor is turned on that imposes the motion 9 (t) = 
(jr /12) cos 7r t on the rod. The differential equation describing the resulting motion 
of the slider is 

/ 2 \ ^ 

r=(^j r sin 2 nt — gsin cosirt) 

Determine the time when the slider reaches the tip of the rod. Use g = 9.80665 
m/s 2 . 
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A ball of mass m = 0.25 kg is launched with the velocity v 0 = 50 m/s in the direc¬ 
tion shown. If the aerodynamic drag force acting on the ball is F D = Cr>v 3/2 , the 
differential equations describing the motion are 

.. Cd .1/2 Cd . 1 /2 

X= - XV 7 v= - vv'—g 

m m 

where v = -Jx 2 + y 2 . Determine the time of flight and the range R. Use Cd = 0.03 
kg/(m s) 1 / 2 and g = 9.80665 m/s 2 . 

14. ■ The differential equation describing the angular position 0 of a mechanical 
arm is 

.. a{b-6)-ee 2 

9 = —-- 

1 + 0 2 

where a = 100 s -2 and b = 15. If 0(0) = 2n and 0(0) = 0, compute 0 and 0 when 
t = 0.5 s. 

15. ■ 



The mass m is suspended from an elastic cord with an extensional stiffness A; and 
undeformed length L. If the mass is released from rest at 0 = 60° with the cord 
unstretched, find the length r of the cord when the position 0 = 0 is reached for 
the first time. The differential equations describing the motion are 

. 2 k 

r = r9 + g cos 0 - (r - L) 

m 

—2r9 — gsin0 

0 =-^- 

r 

Use g = 9.80665m/s 2 , k = 40N/m, L = 0.5mand m= 0.25kg. 

16. ■ Solve Prob. 15 if the pendulum is released from the position 0 = 60° with the 
cord stretched by 0.075 m. 


y 
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Consider the mass-spring system where dry friction is present between the block 
and the horizontal surface. The frictional force has a constant magnitude fimg 
(/r is the coefficient of friction) and always opposes the motion. The differential 
equation for the motion of the block can be expressed as 



where y is measured from the position where the spring is unstretched. If the block 
is released from rest at y = y 0 , verify by numerical integration that the next positive 
peak value of y is y 0 — 4/x mg/ k (this relationship can be derived analytically). Use 
k = 3000 N/m, m= 6 kg, /z = 0.5, g = 9.80665 m/s 2 and y 0 = 0.1 m. 

18. ■ Integrate the following problems from x = 0 to 20 and plot y vs. x: 


(a) y" + 0.5(y 2 - 1)/ + y = 0 y(0) = 1 y'(0) = 0 


(b) y" = y cos 2x 


y(0) = 0 y'(0) = 1 


These differential equations arise in nonlinear vibration analysis. 

19. ■ The solution of the problem 



is the Bessel function J 0 {x). Use numerical integration to compute J 0 (5) and com¬ 
pare the result with —0.17760, the value listed in mathematical tables. Hint: to 
avoid singularity at x = 0, start the integration at x = 10 -12 . 

20. ■ Consider the initial value problem 


y" = 16.81y y(0) = 1.0 y'(0) =-4.1 


(a) Derive the analytical solution, (b) Do you anticipate difficulties in numerical 
solution of this problem? (c) Try numerical integration from x = 0 to 8 to see if 
your concerns were justified. 


21. ■ 



Kirchoff’s equations for the circuit shown are 



dt 


(a) 


— + Rh + 2R{i 2 + ii) — E[t) 


(b) 
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7.4 Stability and Stiffness 


Differentiating Eq. (b) and substituting the charge-current relationship dq 2 /dt = 
h, we get 

di\ —3Ri\ — 2Ri 2 + E{t) 

Tt =- 1 - (cl 

diz 2 di\ z*2 1 dE 

dt 3 dt 3 RC + 3 R dt 

We could substitute di\/dt from Eq. (c) into Eq. (d), so that the latter would assume 
the usual form di 2 /dt = fit, i\. i 2 ), hu L i L is more convenienLlo leave the equations 
as they are. Assuming that the voltage source is turned on at time t = 0, plot the 
loop currents i\ and i 2 from t = 0 to 0.05 s. Use E{t) = 240 sin(120jrf) V, R = 1.0 D, 
L = 0.2 x 1CT 3 H and C = 3.5 x 10“ 3 F. 


22 . ■ 



The constant voltage source E of the circuit shown is turned on at t = 0, causing 
transient currents i\ and i 2 in the two loops that last about 0.05 s. Plot these currents 
fromf = 0 to 0.05s, usingthe following data: E = 9 V,R = 0.25 Q.,L = 1.2 x 10 _3 E1 
and C = 5 x 10 -3 F. Kirchoff’s equations for the two loops are 


(II (/; — (h 

L in + R " + ± i^- E 

di 2 . Q 2 — Qi 


Additional two equations are the current-charge relationships 


d( j . di 2 

dt ll dt 12 


7.4 Stability and Stiffness 

Loosely speaking, a method of numerical integration is said to be stable if the effects 
of local errors do not accumulate catastrophically; that is, if the global error remains 
bounded. If the method is unstable, the global error will increase exponentially, even¬ 
tually causing numerical overflow. Stability has nothing to do with accuracy; in fact, 
an inaccurate method can be very stable. 
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Stability is determined by three factors: the differential equations, the method of 
solution and the value of the increment h. Unfortunately, it is not easy to determine 
stability beforehand, unless the differential equation is linear. 

Stability of Euler's Method 

As a simple illustration of stability, consider the problem 

y = -Ay y(0) = p (7.11) 

where A is a positive constant. The exact solution of this problem is 

y(x) = fie~ Xx 

Let us now investigate what happens when we attempt to solve Eq. (7.11) numer¬ 
ically with Euler’s formula 

y(x + h) = y(x) + hy\x) (7.12) 

Substituting y'(x) = —Ay(x), we get 

y(x + h) = (1 - Ah)y(x) 

If 11 — Xh\ > 1, the method is clearly unstable since |y| increases in every integration 
step. Thus Euler’s method is stable only if 11 — \h\ < 1, or 

h < 2/A (7.13) 

The results can be extended to a system of n differential equations of the form 

y' = —Ay (7.14) 

where A is a constant matrix with the positive eigenvalues A,, i = 1,2,..., n. It can be 
shown that Euler’s implicit method of integration formula is stable only if 

h < 2/A max (7.15) 

where A max is the largest eigenvalue of A. 

Stiffness 

An initial value problem is called stiff if some terms in the solution vector y(x) vary 
much more rapidly with x than others. Stiffness can be easily predicted for the differ¬ 
ential equations y' = — Ay with constant coefficient matrix A. The solution of these 
equations is y(x) = J2i Q v ; exp(—Aj-x), where A,• are the eigenvalues of A and v, are 
the corresponding eigenvectors. It is evident that the problem is stiff if there is a large 
disparity in the magnitudes of the positive eigenvalues. 
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7.4 Stability and Stiffness 


Numerical integration of stiff equations requires special care. The step size h 
needed for stability is determined by the largest eigenvalue A max , even if the terms 
exp (—A max x) in the solution decay very rapidly and becomes insignificant as we move 
away from the origin. 

For example, consider the differential equation 17 

y" + 1001/ + lOOOy = 0 (7.16) 


Using yi = y and 3/2 = y', the equivalent first-order equations are 


In this case 


y' = 


y 2 

—lOOOyi— 1001y 2 


A = 


0 

1000 


-1 

1001 


The eigenvalues of A are the roots of 


|A - AI| 


—A. -1 

1000 1001 -k 


Expanding the determinant we get 


—A(1001 — X) + 1000 = 0 


which has the solutions Xi = 1 and X 2 = 1000. These equations are clearly stiff. Ac¬ 
cording to Eq. (7.15) we would need h < 2/X 2 = 0.002 for Euler’s method to be stable. 
The Runge-Kutta method would have approximately the same limitation on the step 
size. 

When the problem is very stiff, the usual methods of solution, such as the Runge- 
Kutta formulas, become impractical due to the very small h required for stability. These 
problems are best solved with methods that are specially designed for stiff equations. 
Stiff problem solvers, which are outside the scope of this text, have much better stabil¬ 
ity characteristics; some of them are even unconditionally stable, ffowever, the higher 
degree of stability comes at a cost—the general rule is that stability can be improved 
only by reducing the order of the method (and thus increasing the truncation error). 

EXAMPLE 7.7 

(1) Show that the problem 

19 

y" = -—y-w y(0) = -9 y'(0) = 0 


17 This example is taken from C.E. Pearson, Numerical Methods in Engineering and Science, van 
Nostrand and Reinhold (1986). 










276 


Initial Value Problems 


is moderately stiff and estimate /i max , the largest value of h for which the Runge-Kutta 
method would be stable. (2) Confirm theestimate by computingy(lO) withfz « h max /2 
and h ~ 2 h max . 


Solution of Part (1) With the notation y = yi and y' = y 2 the equivalent first-order 
differential equations are 


y = 


y 2 

= -A 

yi 

19 

yi - 10y 2 


J2_ 


where 


A = 


0 

19 

T 


-l 

10 


The eigenvalues of A are given by 


IA — AI| 


—A 

19 

T 


-l 

10-A 


= 0 


which yields Ai = 1/2 and A 2 = 19/2. Because A 2 is quite a bit larger than Ai, the 
equations are moderately stiff. 


Solution of Part (2) An estimate for the upper limit of the stable range of h can be 
obtained from Eq. (7.15): 


h 


max 


2 _ 2 
7-max 19/2 


0.2153 


Although this formula is strictly valid for Euler’s method, it is usually not too far off 
for higher-order integration formulas. 

Here are the results from the Runge-Kutta method with h = 0.1 (by specifying 
freq = 0 in print Sol, only the initial and final values were printed): 


» x yl 

0.0000e+000 -9.0000e+000 

1.0000e+001 -6.4011e-002 


Y2 

0.0000e+000 
3.2005e-002 


The analytical solution is 

yU) 


19 

y 


_g-Jt/2 _|_ _g-19x/2 


yielding y(10) = —0.064011, which agrees with the value obtained numerically. 
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With h = 0.5 we encountered instability, as expected: 


» 


0.0000e+000 -9.0000e+000 


x 


Y2 

0.0000e+000 


1.0000e+001 2.7030e+020 -2.5678e+021 


7.5 Adaptive Runge-Kutta Method 

Determination of a suitable step size h can be a major headache in numerical inte¬ 
gration. If h is too large, the truncation error may be unacceptable; if h is too small, we 
are squandering computational resources. Moreover, a constant step size may not be 
appropriate for the entire range of integration. For example, if the solution curve starts 
off with rapid changes before becoming smooth (as in a stiff problem), we should use 
a small h at the beginning and increase it as we reach the smooth region. This is where 
adaptive methods come in. They estimate the truncation error at each integration step 
and automatically adjust the step size to keep the error within prescribed limits. 

The adaptive Runge-Kutta methods use so-called embedded integration formulas. 
These formulas come in pairs: one formula has the integration order m, the other 
one is of order m- 1-1. The idea is to use both formulas to advance the solution from 
x to x + h. Denoting the results by y m (x + h) and y m+ 1 (x + h) , we may estimate the 
truncation error in the formula of order m as 


E(/t) = y m +i [x + h) - y m (x + h) 


(7.17) 


What makes the embedded formulas attractive is that they share the points where 
F(x, y) is evaluated. This means that once y m {x + h) has been computed, relatively 
small additional effort is required to calculate y„_ i (x + h). 

Here are the Runge-Kutta embedded formulas of orders 5 and 4 that were origi¬ 
nally derived by Fehlberg; hence they are known as Runge-Kutta-Fehlberg formulas: 


Ki = JzF(x, y) 



(7.1) 


6 


y 5 (x + h) = y(x) + ^ C;K, (5th-order formula) 


(7.19a) 


i= 1 
6 


y 4 (x + h) = y(x) + ^ D,K, (4th-order formula) 


i= 1 


(7.19b) 
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The coefficients appearing in these formulas are not unique. The tables below give the 
coefficients proposed by Cash and Karp 18 which are claimed to be an improvement 
over Fehlberg’s original values. 


i 

A 

Bij 

Q 

A 

l 







37 

2825 








378 

27 648 


1 

1 







2 



— 

— 

— 

— 

0 

0 


5 

5 








3 

3 

9 




250 

18 575 


10 

40 

40 




621 

48 384 

4 

3 

3 

9 

6 



125 

13 525 


5 

10 

10 

5 



594 

55 296 



11 

5 

70 

35 
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54 

2 

27 

27 



14 336 


7 

1631 

175 

575 

44275 

253 

512 

1 


8 

55296 

512 

13824 

110592 

4096 

1771 

4 


Table 7.1. Cash-Karp coefficients for Runge-Kutta-Fehlberg formulas 


The solution is advanced with the fifth-order formula in Eq. (7.19a). The fourth- 
order formula is used only implicitly in estimating the truncation error 

6 

E(ft) = y 5 [x + h) - y 4 (x + h) = £(Q - A) K, (7.20) 

i=i 

Since Eq. (7.20) actually applies to the fourth-order formula, it tends to overestimate 
the error in the fifth-order formula. 

Note that E (fi) is a vector, its components E’/(h) representing the errors in the 
dependent variables y,-. This brings up the question: what is the error measure e(h) 
that we wish to control? There is no single choice that works well in all problems. If 
we want to control the largest component of E(h), the error measure would be 

e{h) = max | £,• (ft) I (7.21) 

i 


18 J.R. Cash and A.H. Carp, ACM Transactions on Mathematical Software 16, 201-222 (1990). 
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7.5 Adaptive Runge-Kutta Method 


We could also control some gross measure of the error, such as the root-mean-square 
error defined by 


E(h) 


N 



where n is the number of first-order equations. Then we would use 


(7.22) 


e{h) = E{h ) 


(7.23) 


for the error measure. Since the root-mean-square error is easier to handle, we adopt 
it for our program. 

Error control is achieved by adjusting the increment h so that the per-step error 
e is approximately equal to a prescribed tolerance e. Noting that the truncation error 
in the fourth-order formula is 0{h 5 ), we conclude that 


e(hi) 

e{h 2 ) 


(a) 


Let us now suppose that we performed an integration step with h\ that resulted in 
the error e{hi). The step size h 2 that we should have used can now be obtained from 
Eq. (a) by setting e{h 2 ) = e: 


h 2 = hi 


£ 

e(h) 


1/5 


(b) 


If h 2 > h\, we could repeat the integration step with h 2 , but since the error associated 
with hi was below the tolerance, that would be a waste of a perfectly good result. So 
we accept the current step and try h 2 in the next step. On the other hand, if h 2 < hi, 
we must scrap the current step and repeat it with h 2 . As Eq. (b) is only an approxima¬ 
tion, it is prudent to incorporate a small margin of safety. In our program we use the 
formula 


h 2 = 0.9 h\ 


E 

e{h) 


1/5 


(7.24) 


Recall that e{h) applies to a single integration step; that is, it is a measure of the local 
truncation error. The all-important global truncation error is due to the accumulation 
of the local errors. What should e be set at in order to achieve a global error no greater 
than eglobal? Since e(h) is a conservative estimate of the actual error, setting e = £ g i 0 bai 
will usually be adequate. If the number integration steps is large, it is advisable to 
decrease £ accordingly. 

Is there any reason to use the nonadaptive methods at all? Usually no; however, 
there are special cases where adaptive methods break down. For example, adaptive 
methods generally do not work if F(jic, y) contains discontinuous functions. Because 
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the error behaves erratically at the point of discontinuity, the program can get stuck 
in an infinite loop trying to find the appropriate value of h. We would also use a 
nonadaptive method if the output is to have evenly spaced values of x. 

■ runKut5 

The adaptive Runge-Kutta method is implemented in the function runKut 5 listed 
below. The input argument h is the trial value of the increment for the first integration 
step. 

function [xSol.ySol] = runKut5(dEqs,x,y,xStop,h,eTol) 

% 5th-order Runge-Kutta integration. 

% USAGE: [xSol.ySol] = runKut5(dEqs,x,y,xStop,h,eTol) 

% INPUT: 

% dEqs = handle of function that specifyies the 
% lst-order differential equations 

% F(x,y) = [dyl/dx dy2/dx dy3/dx . . .] . 

% x,y = initial values; y must be row vector. 

% xStop = terminal value of x. 

% h = trial value of increment of x. 

% eTol = per-step error tolerance (default = 1.0e-6). 

% OUTPUT: 

% xSol = x-values at which solution is computed. 

% ySol = values of y corresponding to the x-values. 

if size(y,l) >1 ; y = y’ ; end % y must be row vector 
if nargin < 6; eTol = 1.0e-6; end 
n = length(y); 

A = [0 1/5 3/10 3/5 1 7/8]; 


[ 0 

0 

0 

0 

0 

1/5 

0 

0 

0 

0 

3/40 

9/40 

0 

0 

0 

3/10 

-9/10 

6/5 

0 

0 

-11/54 

5/2 

-70/27 

35/27 

0 

1631/55296 

175/512 

575/13824 

44275/110592 

253/4096] 

[37/378 0 

250/621 

125/594 0 

512/1771]; 



D = [2825/27648 0 18575/48384 13525/55296 277/14336 1/4]; 
% Initialize solution 

xSol = zeros(2,l); ySol = zeros(2,n); 
xSol(l) = x; ySol(l,:) = y; 
stopper =0; k = 1; 
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for p = 2:5000 

% Compute K’s from Eq. (7.18) 

K = zeros(6,n); 

K(l,:) = h*feval(dEqs,x,y); 
for i = 2:6 

BK = zeros(l,n); 
for j = l:i-l 

BK = BK + B(i,j)*K(j,:); 

end 

K(i,:) = h*feval(dEqs, x + A(i)*h, y + BK); 

end 

% Compute change in y and per-step error from 

% Eqs.(7.19) & (7.20) 

dy = zeros(l,n); E = zeros(l,n); 

for i = 1:6 

dy = dy + C(i)*K(i, :); 

E = E + (C(i) - D(i))*K(i, :); 

end 

e = sqrt(sum(E.*E)/n); 

% If error within tolerance, accept results and 
% check for termination 
if e <= eTol 

y = y + dy; x=x+h; 
k = k + 1; 

xSol(k) = x; ySol(k,:) = y; 
if stopper == 1; 
break 

end 

end 

% Size of next integration step from Eq. (7.24) 
if e'= 0; hNext = 0. 9*h*(eTol/e) “0 . 2; 
else; hNext=h; 
end 

% Check if next step is the last one (works 
% with positive and negative h) 
if (h > 0) == (x + hNext >= xStop ) 
hNext = xStop - x; stopper = 1; 

end 

h = hNext; 


end 
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EXAMPLE 7.8 

The aerodynamic drag force acting on a certain object in free fall can be approxi¬ 
mated by 


F d = av 2 e ^ 


where 


v = velocity of the object in m/s 
y = elevation of the object in meters 
a = 7.45 kg/m 
b= 10.53 x 10“ 5 m -1 

The exponential term accounts for the change of air density with elevation. The dif¬ 
ferential equation describing the fall is 


my = —mg + F D 


where g = 9.80665 m/s 2 and m= 114 kg is the mass of the object. If the object is 
released at an elevation of 9 km, determine its elevation and speed after a 10s fall with 
the adaptive Runge-Kutta method. 

Solution The differential equation and the initial conditions are 

y=~g+ ^y 2 exp(-by) 

7.45 

= -9.80665-1—-—y 2 exp(—10.53 x 10 _5 y) 


y(0) = 9000 m y(0) = 0 

Letting yi = y and y 2 = y, we obtain the equivalent first-order equations and the 
initial conditions as 

k2 

-9.80665 + (65.351 x 10“ 3 ) y\ exp(-10.53 x 10- 5 yi) 

9000 m 



The function describing the differential equations is 
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function E = fex7_8(x,y) 

% Diff. eqs. used in Example 7.8 
F = zeros(l,2); 

F(l) = y(2); 

F(2) = -9.80665. . . 

+ 65.351e-3 * y(2)~2 * exp(-10.53e-5 * y(l)); 

The commands for performing the integration and displaying the results are 
shown below. We specified a per-step error tolerance of 10 -2 in runKuts. Consid¬ 
ering the magnitude of y, this should be enough for five decimal point accuracy in the 
solution. 


» [x,y] = runKut5(@fex7_8,0,[9000 0],10,0.5,1.Oe-2); 
» printSol(x,y,1) 

Execution of the commands resulted in the following output: 


» x 

0.0000e+000 
5.0000e-001 
1.9246e+000 
3.2080e+000 
4.5031e+000 
5.9732e+000 
7.7786e+000 
1.0000e+001 


yi 

9.0000e+003 
8.9988e+003 
8.9841e+003 
8.9627e+003 
8.9384e+003 
8.9099e+003 
8.8746e+003 
8.8312e+003 


y2 

0.0000e+000 
-4.8043e+000 
-1.4632e+001 
-1.8111e+001 
-1.9195e+001 
-1.9501e+001 
-1.9549e+001 
-1.9519e+001 


The first integration step was carried out with the prescribed trial value h = 
0.5 s. Apparently the error was well within the tolerance, so that the step was accepted. 
Subsequent step sizes, determined from Eq. (7.24), were considerably larger. 

Inspecting the output, we see that at t= 10 s the object is moving with the speed 
v = —y = 19.52 m/s at an elevation of y = 8831 m. 

EXAMPLE 7.9 

Integrate the moderately stiff problem 

19 

y-ioy' y(0) = -9 y'(0) = o 

from x = 0 to 10 with the adaptive Runge-Kutta method and plot the results (this 
problem also appeared in Example 7.7). 
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Solution Since we use an adaptive method, there is no need to worry about the stable 
range of h, as we did in Example 7.7. As long as we specify a reasonable tolerance 
for the per-step error, the algorithm will find the appropriate step size. Here are the 
commands and the resulting output: 


» [x,y] = runKut5(@fex7_7,0 , [-9 0],10,0.1); 
» printSol(x,y,4) 


X 

yi 

y2 

0.0000e+000 

-9.0000e+000 

0.0000e+000 

9.8941e-002 

-8.8461e+000 

2.66 51e+000 

2.1932e-001 

-8.4511e+000 

3.6653e+000 

3.70 58e-001 

-7.8784e+000 

3.8061e+000 

5.7229e-001 

-7.1338e+000 

3.5473e+000 

8.6922e-001 

-6.1513e+000 

3.0745e+000 

1.4009e+000 

-4.7153e+000 

2.3 5 77e+000 

2.8558e+000 

-2.2783e+000 

1.1391e+000 

4.3990e+000 

-1.0531e+000 

5.26 56e-001 

5.9545e+000 

-4.8385e-001 

2.4193e-001 

7.5 596e+000 

-2.168 5e-001 

1.0843e-001 

9.1159e+000 

-9.9 591e-002 

4.9794e-002 

1.0000e+001 

-6.4010e-002 

3.2005e-002 


The results are in agreement with the analytical solution. 

The plots of y and y' show every fourth integration step. Note the high density of 
points near x = 0 where y' changes rapidly. As the y'-curve becomes smoother, the 
distance between the points increases. 
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7.6 Bulirsch-Stoer Method 
Midpoint Method 

The midpoint formula of numerical integration of y' = F(x, y) is 

y(x + h) = y(x - h) + 2hF [x, y(x)] (7.25) 

It is a second-order formula, like the modified Euler’s formula. We discuss it here 
because it is the basis of the powerful Bulirsch-Stoer method, which is the technique 
of choice in problems where high accuracy is required. 



Figure 7.3. Graphical repesentation of the midpoint 
formula. 


Figure 7.3 illustrates the midpoint formula for a single differential equation y' = 
fix, y ). The change in y over the two panels shown is 


yix + h) - y[x - h) 



y'[x)dx 


which equals the area under the y'ix) curve. The midpoint method approximates this 
area by the area 2hf[x, y) of the cross-hatched rectangle. 


o- 



*1 *2 *3 V *n-1 x n 


Figure 7.4. Mesh used in the midpoint method. 


Consider now advancing the solution of y'(x) = F(x, y) from x = x 0 to x 0 + H with 
the midpoint formula. We divide the interval of integration into n steps of length 
h = H/n each, as shown in Fig. 7.4, and carry out the computations 


yi = yo + hv 0 
yi = yo + 2hFi 

y 3 = yi + 2hF 2 (7.26) 


y n = y n ~2 + 2hF„_i 

Here we used the notation y; = y(x,) and F/ = F(x;, y,). The first of Eqs. (7.26) uses 
the Euler formula to "seed” the midpoint method; the other equations are midpoint 
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formulas. The final result is obtained by averaging y„ in Eq. (7.26) and the estimate 
y n ~ y„_i + h¥ n available from Euler formula: 



(7.27) 


Richardson Extrapolation 

It can be shown that the error in Eq. (7.27) is 

E = ci h 2 + C2 h 4 + C3/2 6 H - 

fferein lies the great utility of the midpoint method: we can eliminate as many of the 
leading error terms as we wish by Richardson’s extrapolation. For example, we could 
compute y(xo + H] with a certain value of h and then repeat the process with h/2. 
Denoting the corresponding results by g (h) and g(h/2), Richardson’s extrapolation— 
see Eq. (5.9)—then yields the improved result 


4g(fe/2) - g m 
3 


ybetter(-*5o + H) 


which is fourth-order accurate. Another round of integration with h/4 followed by 
Richardson’s extrapolation get us sixth-order accuracy, etc. 

The y’s in Eqs. (7.26) should be viewed as a intermediate variables, because unlike 
yUo + H ), they cannot be refined by Richardson’s extrapolation. 

■ midpoint 

The function midpoint in this module combines the midpoint method with Richard¬ 
son extrapolation. The first application of the midpoint method uses two integration 
steps. The number of steps is doubled in successive integrations, each integration 
being followed by Richardson extrapolation. The procedure is stopped when two suc¬ 
cessive solutions differ (in the root-mean-square sense) by less than a prescribed 
tolerance. 

function y = midpoint(dEqs,x,y,xStop,tol) 

% Modified midpoint method for intergration of y’ = F(x,y). 

% USAGE: y = midpoint(dEqs,xStart,yStart,xStop,tol) 

% INPUT: 

% dEqs = handle of function that returns the first-order 
% differential equations F(x,y) = [dyl/dx,dy2/dx,...]. 

% x, y = initial values; y must be a row vector. 

% xStop = terminal value of x. 

% tol = per-step error tolerance (default = 1.0e-6). 
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% OUTPUT: 

% y = y(xStop). 

if size(y,l) >1 ; y = y'; end % y must be row vector 
if nargin <5; tol = 1.0e-6; end 
kMax = 51; 
n = length(y); 

r = zeros(kMax,n); % Storage for Richardson extrapolation. 

% Start with two integration steps. 
nSteps = 2; 

r(l,l:n) = mid(dEqs,x,y,xStop,nSteps); 
rOld = r(1,1:n); 
for k = 2:kMax 

% Double the number of steps & refine results by 
% Richardson extrapolation. 
nSteps = 2*k; 

r(k,l:n) = mid(dEqs,x,y,xStop,nSteps); 
r = richardson(r,k,n); 

% Check for convergence. 

dr = r(l,l:n) - rOld; 

e = sqrt(dot(dr,dr)/n); 

if e < tol; y = r(l,l:n); return; end 

rOld = r(1,1:n); 

end 

error(’Midpoint method did not converge’) 

function r = richardson(r,k,n) 

% Richardson extrapolation, 
for j = k-1:-1:1 

c =(k/(k-l))*(2*(k-j)) ; 

r(j,1:n) = (c*r(j+1,1:n) - r(j,l:n))/(c - 1.0); 

end 

return 

function y = mid(dEqs,x,y,xStop,nSteps) 

% Midpoint formulas. 
h = (xStop - x)/nSteps; 
y0 = y; 

yl = yO + h*feval(dEqs,x,yO); 
for i = l:nSteps-l 
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x = x + h; 

y2 = yO + 2.0*h*feval(dEqs,x,yl); 
yO = yl; 
yl = y2; 

end 

y = 0.5*(yl + yO + h*feval(dEqs,x,y2)); 


Bulirsch-Stoer Algorithm 

When used on its own, the module midpoint has a major shortcoming: the solution 
at points between the initial and final values of x cannot be refined by Richardson 
extrapolation, so that y is usable only at the last point. This deficiency is rectified in 
the Bulirsch-Stoer method. The fundamental idea behind the method is simple: apply 
the midpoint method in a piecewise fashion. That is, advance the solution in stages of 
length H, using the midpoint method with Richardson extrapolation to perform the 
integration in each stage. The value of H can be quite large, since the precision of the 
result is determined mainly by the step length h in the midpoint method, not by H. 

The original Bulirsch and Stoer technique 19 is a complex procedure that incorpo¬ 
rates many refinements missing in our algorithm. However, the function buistoer 
given below retains the essential ideas of Bulirsch and Stoer. 

What are the relative merits of adaptive Runge-Kutta and Bulirsch-Stoer meth¬ 
ods? The Runge-Kutta method is more robust, having higher tolerance for nonsmooth 
functions and stiff problems. In most applications where high precision is not required, 
it also tends to be more efficient. However, this is not the case in the computation of 
high-accuracy solutions involving smooth functions, where the Bulirsch-Stoer algo¬ 
rithm shines. 


■ bulStoer 

This function contains a simplified algorithm for the Bulirsch-Stoer method. 

function [xSol.ySol] = bulStoer(dEqs,x,y,xStop,H,tol) 

% Simplified Bulirsch-Stoer method for integration of y’ = F(x,y). 
% USAGE: [xSol.ySol] = bulStoer(dEqs,x,y,xStop,H,tol) 

% INPUT: 

% dEqs = handle of function that returns the first-order 
% differential equations F(x,y) = [dyl/dx,dy2/dx,...]. 


19 Stoer, J., and Bulirsch, R., Introduction to Numerical Analysis, Springer, 1980. 
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% x, y = initial values; y must be a row vector. 

% xStop = terminal value of x. 

% H = increment of x at which solution is stored. 

% tol = per-step error tolerance (default = 1.0e-6). 

% OUTPUT: 

% xSol, ySol = solution at increments H. 

if size(y,l) >1 ; y = y’; end % y must be row vector 
if nargin < 6; tol = 1.0e-6; end 
n = length(y); 

xSol = zeros(2,l); ySol = zeros(2,n); 
xSol(l) = x; ySol(l,:) = y; 
k = l; 

while x < xStop 

k = k + 1; 

H = min(H,xStop - x); 
y = midpoint(dEqs,x,y,x + H,tol); 
x = x + H; 

xSol(k) = x; ySol(k,:) = y; 

end 

EXAMPLE 7.10 

Compute the solution of the initial value problem 

y' = siny y(0) = 1 

at x = 0.5 with the midpoint formulas using n= 2 and n = 4, followed by Richardson 
extrapolation (this problem was solved with the second-order Runge-Kutta method 
in Example 7.3). 

Solution With n = 2 the step length is h = 0.25. The midpoint formulas, Eqs. (7.26) 
and (7.27), yield 


yi = y 0 + hf 0 = 1 + 0.25 sin 1.0 = 1.210 368 

y 2 = y 0 + 2hfx = 1+2(0.25) sin 1.210 368 = 1.467 87 3 

y/i(0.5) = i(yi + y 0 + hf 2 ) 

1 

= -(1.210 368 + 1.467 87 3+ 0.25 sin 1.467 87 3) 

= 1.463 459 
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Using n = 4 we have h = 0.125 and the midpoint formulas become 
yi = yo + hfo = 1 +0.125 sin 1.0 = 1.105184 

y 2 = y 0 + 2hf 1 = 1 + 2(0.125) sin 1.105 184= 1.223 387 

y 3 = y 1 + 2hf 2 = 1.105 184+ 2(0.125) sin 1.223 387 = 1.340248 

y 4 = y 2 + 2hf 3 = 1.223 387+ 2(0.125) sin 1.340248 = 1.466772 

y h / 2 ( 0.5) = i(y 4 + y 3 + hfi) 

= i (1.466 772+ 1.340 248+ 0.125 sin 1.466 772) 

= 1.465 672 


Richardson extrapolation results in 


4(1.465 672) - 1.463 459 
y(0.5) = --4- 


1.466410 


which compares favorably with the "true” solution y(0.5) = 1.466404. 


EXAMPLE 7.11 


E(t) 



The differential equations governing the loop current i and the charge q on the ca¬ 
pacitor of the electric circuit shown are 


di q 

L- + Ri+-E = E{t) 
at C 


dq 

dt 


If the applied voltage E is suddenly increased from zero to 9 y plot the resulting loop 
current during the first ten seconds. Use R= 1.0 U, L = 2 H and C = 0.45 F. 


Solution Letting 


yi 


q 

J 2 . 


i 


and substituting the given data, the differential equations become 


yi 


y2 

J 2 . 


{-Ry 2 - y\/c + E) /l 
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The initial conditions are 


We solved the problem with the function buistoer using the increment H = 
0.5 s. The following program utilizes the plotting facilities of MATLAB: 

% Example 7.11 (Bulirsch-Stoer integration) 

[xSol,ySol] = bulStoer(@fex7_ll,0,[0 0],10,0.5); 
plot(xSol,ySol(:,2),’k:o') 
grid on 

xlabelC’Time (s)’) 
ylabelf’Current (A)’) 








— 


— 












/ 

1 

> 








? 


\ 








I 


0 








• 


\ 






a. 




( 




. 

P' 


'•s. 





\ 


c 

3 



t 




Q 

_i.. 


p 









hs£ 

5 



_ 



/] 5 _I_!_I_I_ 2 b _ I _ I _1_I_1_ 

0123456789 10 

Time (s) 


Recall that in each interval H (the spacing of open circles) the integration was per¬ 
formed by the modified midpoint method and refined by Richardson’s extrapolation. 

PROBLEM SET 7.2 

1. Derive the analytical solution of the problem 

y" + y' — 380y = 0 y(0) = 1 y'(0) = -20 


Would you expect difficulties in solving this problem numerically? 
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2. Consider the problem 

y' = x- lOy y(0) = 10 

(a) Verify that the analytical solution is y(x) = 0.lx— 0.01 + 10.Ole -10 *, (b) Deter¬ 
mine the step size h that you would use in numerical solution with the (nonadap- 
tive) Runge-Kutta method. 

3. ■ Integrate the initial value problem in Prob. 2 from x = 0 to 5 with the Runge- 
Kutta method using (a) h = 0.1; (b) h = 0.25; and (c) h = 0.5. Comment on the 
results. 

4. ■ Integrate the initial value problem in Prob. 2 from x = 0 to 10 with the adaptive 
Runge-Kutta method. 

5. ■ 

_ -y 

m 



6 . 


7. 


The differential equation describing the motion of the mass-spring-dashpot sys¬ 
tem is 

c k 

y+-y+-y = 0 
m m 

where rn = 2 kg, c= 460 Ns/m and k= 450 N/m. The initial conditions are y(0) = 
0.01 m and y(0) = 0. (a) Show that this is a stiff problem and determine a value of 
h that you would use in numerical integration with the nonadaptive Runge-Kutta 
method, (b) Carry out the integration from t = 0 to 0.2 s with the chosen h and 
plot y vs. t. 

■ Integrate the initial value problem specified in Prob. 5 with the adaptive Runge- 
Kutta method from t = 0 to 0.2 s, and plot y vs. t. 

■ Compute the numerical solution of the differential equation 


y" = 16 . 81 y 


from x = 0 to 2 with the adaptive Runge-Kutta method. Use the initial conditions 
(a) y(0) = 1.0, y'(0) = —4.1; and (b) y(0) = 1.0, y'(0) = —4.11. Explain the large 
difference in the two solutions. Hint: derive the analytical solutions. 

8. ■ Integrate 

y" + y'-y 2 = 0 y(0) = 1 y'(0) = 0 


from x = 0 to 3.5. Investigate whether the sudden increase in y near the upper 
limit is real or an artifact caused by instability. Hint: experiment with different 
values of h. 
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9. ■ Solve the stiff problem—see Eq. (7.16) 

y" + 1001/ + lOOOy = 0 y(0) = 1 y'(0) = 0 

from x = 0 to 0.2 with the adaptive Runge-Kutta method and plot y' vs. x. 

10. ■ Solve 

y" + 2y' + 3y = 0 y(0) = 0 y'(0) = V2 

with the adaptive Runge-Kutta method from x = 0 to 5 (the analytical solution is 
y = e _j: sin/2jc). 

11. ■ Use the adaptive Runge-Kutta method to solve the differential equation 

y" = 2yy' 

from x = 0 to 10 with the initial conditions y(0) = 1, y' (0) = — 1. Plot y vs. x. 

12. ■ Repeat Prob. 11 with the initial conditions y(0) = 0, y' (0) = 1 and the integration 
range x = 0 to 1.5. 

13. ■ Use the adaptive Runge-Kutta method to integrate 

y = Q-y) X yw = 5 

from x = 0 to 5 and plot y vs. x. 

14. Solve Prob. 13 with the Bulirsch-Stoer method using H = 0.5. 

15. ■ Integrate 

x 2 y" + xy' + y=0 y(l) = 0 y'(l) = -2 

from x = 1 to 20, and plot y and y' vs. x. Use the Bulirsch-Stoer method. 

16. ■ 


— 

-y - x - 





m 



The magnetized iron block of mass m is attached to a spring of stiffness k and 
free length L. The block is at rest at x = L when the electromagnet is turned on, 
exerting the repulsive force F = c/x 2 on the block. The differential equation of 
the resulting motion is 

c 

mx= - k{x — L) 
x‘ L 

Determine the amplitude and the period of the motion by numerical integration 
with the adaptive Runge-Kutta method. Use c = 5 N-m 2 , k = 120 N/m, L = 0.2 m 
and m= 1.0 kg. 







294 


Initial Value Problems 


17. ■ 



The bar ABC is attached to the vertical rod with a horizontal pin. The assembly 
is free to rotate about the axis of the rod. In the absence of friction, the equations 
of motion of the system are 


9 = 0 2 sin 9 cos 9 0 = — 200 cot 9 


If the system is set into motion with the initial conditions 9 (0) = ji /12 rad, 9 (0) = 
0, 0(0) = 0 and 0(0) = 20 rad/s, obtain a numerical solution with the adaptive 
Runge-Kutta method from t = 0 to 1.5 s and plot 0 vs. t. 

18. ■ Solve the circuit problem in Example 7.11 if R = 0 and 

, f 0 when t < 0 
E{t) = \ 

I 9 sin jtt when t > 0 

19. ■ Solve Prob. 21 in Problem Set 1 if E = 240 V (constant). 


20. ■ 



Kirchoff’s equations for the circuit in the figure are 

L—- + R\i\ + R 2 i.i 1 — h) = E{t) 
at 

dio .. a? 

L—77 + R2U2 -U)+' = 0 
at C 

where 

da2 

~dt =h 

Using the data Ri = 4 £2, R 2 = 10 £2, L = 0.032 H, C = 0.53 F and 

f 20Vif0 < t < 0.005 s 
E{t) = ] 

0 otherwise 

plot the transient loop currents i\ and i 2 from t = 0 to 0.05 s. 
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21. ■ Consider a closed biological system populated by M number of prey and N 
number of predators. Volterra postulated that the two populations are related by 
the differential equations 


M=aM-bMN 
N = -cN+dMN 

where a, b, c and d are constants. The steady-state solution is M 0 = c/d, N 0 = a/b; 
if numbers other than these are introduced into the system, the populations 
undergo periodic fluctuations. Introducing the notation 


y i = M/M 0 y 2 = n/n 0 


allows us to write the differential equations as 

J>i = a{yi - y 1 y 2 ) 
y 2 = b{-y 2 + yiy 2 ) 

Usinga= 1.0/year, b = 0.2/year, y i (0) = 0.1andy 2 (0) = 1.0, plot the two popu¬ 
lations from t = 0 to 50 years. 

22. ■ The equations 


u = —au+av 


v= cu — v - uw 


w = -bw+ uv 

known as the Lorenz equations, are encountered in theory of fluid dynamics. 
Letting a = 5.0, b = 0.9 and c = 8.2, solve these equations from t = 0 to 10 with 
the initial conditions u{ 0) = 0, v[0) = 1.0, w{ 0) = 2.0 and plot u{t). Repeat the 
solution with c = 8.3. What conclusions can you draw from the results? 

MATLAB Functions 

[xSol.ySol] = ode23(dEqs,[xStart,xStop],yStart) low-order (probably 
third order) adaptive Runge-Kutta method. The function dEqs must return the 
differential equations as a column vector (recall that runKut4 and runKuts 
require row vectors). The range of integration is from xStart to xStop with the 
initial conditions yStart (also a column vector). 

[xSol.ySol] = ode45(dEqs , [xStart xStop] ,yStart ) issimilarto ode23, but 
uses a higher-order Runge-Kutta method (probably fifth order). 
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These two methods, as well as all the methods described in in this book, belong 
to a group known as single-step methods. The name stems from the fact that the 
information at a single point on the solution curve is sufficient to compute the next 
point. There are also multistep methods that utilize several points on the curve to 
extrapolate the solution at the next step. These methods were popular once, but have 
lost some of their luster in the last fewyears. Multistep methods have two shortcomings 
that complicate their implementation: 

• The methods are not self-starting, but must be provided with the solution at the 
first few points by a single-step method. 

• The integration formulas assume equally spaced steps, which makes it makes it 
difficult to change the step size. 

Both of these hurdles can be overcome, but the price is complexity of the algorithm 
that increases with sophistication of the method. The benefits of multistep methods 
are minimal—the best of them can outperform their single-step counterparts in cer¬ 
tain problems, but these occasions are rare. MATLAB provides one general-purpose 
multistep method: 

[xSol,ySol] = odell3(dEqs,[xStart xStop],yStart )uses the variable- 
order Adams-Bashforth-Moulton method. 

MATLAB has also several functions for solving stiff problems. These are odelSs 
(this is the first method to try when a stiff problem is encountered), ode2 3 s, ode2 3t 
and ode23tb. 
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Solve y" = f[x , y, y'), y{a ) = a, yib) = f 


8.1 Introduction 

In two-point boundary value problems the auxiliary conditions associated with the 
differential equation, called the boundary conditions, are specified at two different 
values of x. This seemingly small departure from initial value problems has a major 
repercussion—it makes boundary value problems considerably more difficult to solve. 
In an initial value problem we were able to start at the point where the initial values 
were given and march the solution forward as far as needed. This technique does not 
work for boundary value problems, because there are not enough starting conditions 
available at either end point to produce a unique solution. 

One way to overcome the lack of starting conditions is to guess the missing values. 
The resulting solution is very unlikely to satisfy boundary conditions at the other end, 
but by inspecting the discrepancy we can estimate what changes to make to the initial 
conditions before integrating again. This iterative procedure is known as the shooting 
method. The name is derived from analogy with target shooting—take a shot and 
observe where it hits the target, then correct the aim and shoot again. 

Another means of solving two-point boundary value problems is the finite differ¬ 
ence method, where the differential equations are approximated by finite differences 
at evenly spaced mesh points. As a consequence, a differential equation is transformed 
into set of simultaneous algebraic equations. 

The two methods have a common problem: they give rise to nonlinear sets of 
equations if the differential equation is not linear. As we noted in Chapter 4, all methods 
of solving nonlinear equations are iterative procedures that can consume a lot of 
computational resources. Thus solution of nonlinear boundary value problems is not 
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cheap. Another complication is that iterative methods need reasonably good starting 
values in order to converge. Since there is no set formula for determining these, an 
algorithm for solving nonlinear boundary value problems requires intelligent input; 
it cannot be treated as a “black box.” 


8.2 Shooting Method 

Second-Order Differential Equation 

The simplest two-point boundary value problem is a second-order differential equa¬ 
tion with one condition specified at x = a and another one at jc = b. Here is an exam¬ 
ple of a second-order boundary value problem: 

y" = fix, y, y'), yia) = a, y{b) = p (8.1) 

Let us now attempt to turn Eqs. (8.1) into the initial value problem 

y" = fix, y, y'), yia) = a, y'ia) = u (8.2) 

The key to success is finding the correct value of u. This could be done by trial and 
error: guess u and solve the initial value problem by marching from x = a to b. If 
the solution agrees with the prescribed boundary condition y[b) = ft, we are done; 
otherwise we have to adjust u and try again. Clearly, this procedure is very tedious. 

More systematic methods become available to us if we realize that the determi¬ 
nation of u is a root-finding problem. Because the solution of the initial value problem 
depends on u, the computed boundary value y[b) is a function of u, that is 

y(b) = 6 iu) 

Hence u is a root of 


r(u) = 9{u) - p = 0 (8.3) 

where r{u ) is the boundary residual (difference between the computed and specified 
boundary values). Equation (8.3) can be solved by any one of the root-finding methods 
discussed in Chapter 4. We reject the method of bisection because it involves too many 
evaluations of 9 (w). In the Newton-Raphson method we run into the problem of having 
to compute dd/du, which can be done, but not easily. That leaves Brent’s algorithm 
as our method of choice. 

Here is the procedure we use in solving nonlinear boundary value problems: 

1. Specify the starting values U\ and u 2 which must bracket the root uof Eq. (8.3). 

2. Apply Brent’s method to solve Eq. (8.3) for u. Note that each iteration requires 
evaluation of 6 (u) by solving the differential equation as an initial value problem. 
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3. Having determined the value of u, solve the differential equations once more and 
record the results. 

If the differential equation is linear, any root-finding method will need only one 
interpolation to determine u. But since Brent’s method uses quadratic interpolation, 
it needs three points: u\, u 2 and u. 3 , the latter being provided by a bisection step. This 
is wasteful, since linear interpolation with U\ and u 2 would also result in the correct 
value of u. Therefore, we replace Brent’s method with linear interpolation whenever 
the differential equation is linear. 

■ linlnterp 

Here is the algorithm for linear interpolation: 
function root = linlnterp(func,xl,x2) 

% Finds the zero of the linear function f(x) by straight 
% line interpolation between xl and x2. 

% func = handle of function that returns f(x). 

fl = feval(func,xl); f2 = feval(func,x2); 
root = x2 - f2*(x2 - xl)/(f2 - fl); 

EXAMPLE 8.1 

Solve the nonlinear boundary value problem 

y" + 3yy' = 0 y(0) = 0 y(2) = 1 
Solution The equivalent first-order equations are 



y 2 

3yiy 2 


with the boundary conditions 


yi(0)=0 yi(2) = l 


Now comes the daunting task of estimating the trial values of y 2 (0) = y'(0), the 
unspecified initial condition. We could always pick two numbers at random and hope 
for the best. However, it is possible to reduce the element of chance with a little 
detective work. We start by making the reasonable assumption that y is smooth (does 
not wiggle) in the interval 0 < x < 2. Next we note that y has to increase from 0 to 1, 
which requires y' > 0. Since both y and y' are positive, we conclude that y" must be 
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negative in order to satisfy the differential equation. Now we are in a position to make 
a rough sketch of y: 



Looking at the sketch it is clear that y'{ 0) > 0.5, so that y'(0) = 1 and 2 appear to be 
reasonable values for the brackets of y'(0); if they are not, Brent’s method will display 
an error message. 

In the program listed below we chose the nonadaptive Runge-Kutta method 
(runKut4) for integration. Note that three user-supplied functions are needed to de¬ 
scribe the problem at hand. Apart from the function dEqs(x,y) that defines the 
differential equations, we also need the functions inCond(u) to specify the initial 
conditions for integration, and residual(u) that provides Brent’s method with the 
boundary residual. By changing a few statements in these functions, the program 
can be applied to any second-order boundary value problem. It also works for third- 
order equations if integration is started at the end where two of the three boundary 
conditions are specified. 


function shoot2 

% Shooting method for 2nd-order boundary value problem 


% in Example 8.1. 


global XSTART XSTOP H % 
XSTART = 0; XSTOP =2; % 
H = 0.1; % 
freq =2; % 
ul = 1; u2 = 2; % 


u = brent(©residual,ul,u2); 
[xSol,ySol] = runKut4(@dEqs 
printSol(xSol,ySol,freq) 

function F = dEqs(x,y) % 

F = [y(2), -3*y(l)*y(2)]; % 

function y = inCond(u) % 

y = [0 u]; % 


Make these params. global. 
Range of integration. 

Step size. 

Frequency of printout. 
Trial values of unknown 
initial condition u. 

x,inCond(u),XSTOP,H); 

First-order differential 
equations. 

Initial conditions (u is 
the unknown condition). 
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function r = residual(u) % Boundary residual, 
global XSTART XSTOP H 
x = XSTART; 

[xSol,ySol] = runKut4(@dEqs,x,inCond(u),XSTOP,H); 
r = ySol(size(ySol,1),1) - 1; 

Here is the solution : 


» 


X 

Yi 

y2 

0.0000e+000 

0.0000e+000 

1.5145e+000 

2.0000e-001 

2.9404e-001 

1.3848e+000 

4.0000e-001 

5.4170e-001 

1.0743e+000 

6.0000e-001 

7.2187e-001 

7.3287e-001 

8.0000e-001 

8.3944e-001 

4.5752e-001 

1.0000e+000 

9.1082e-001 

2.7013e-001 

1.2000e+000 

9.5227e-001 

1.5429e-001 

1.4000e+000 

9.7572e-001 

8.6471e-002 

1.6000e+000 

9.8880e-001 

4.7948e-002 

1.8000e+000 

9.9602e-001 

2.6430e-002 

2.0000e+000 

1.0000e+000 

1.4522e-002 


Note that y'(0) = 1.5145, so that our initial guesses of 1.0 and 2.0 were on the 


mark. 


EXAMPLE 8.2 

Numerical integration of the initial value problem 

y" + 4y=4x y(0) = 0 y'(0) = 0 


yielded y'(2) = 1.653 64. Use this information to determine the value of y'(0) that 
would result in y'(2) =0. 


Solution We use linear interpolation 


u= u 2 — d{u 2 ) 


U 2 - U\ 

6{u 2 ) -6{ui) 


where in our case u = y'(0) and 0(u) = y'(2). So far we are given U\ = 0 and 9{ui) = 
1.653 64. To obtain the second point, we need another solution of the initial value 
problem. An obvious solution is y = x, which gives us y(0) = 0 and y'(0) = y'(2) = 1. 
Thus the second point is u 2 = 1 and 9{u 2 ) = 1. Linear interpolation now yields 


y'(0) = u=l-(l) 


1 - 0 

1 - 1.653 64 


2.529 89 


Since the problem is linear, no further iterations are needed. 
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EXAMPLE 8.3 

Solve the third-order boundary value problem 

y'" = 2 y" + 6 xy y(0) = 2 y(5) = y'( 5) = 0 

and plot y vs. x. 

Solution The first-order equations and the boundary conditions are 



y[ 


y 2 

y' = 

y : 2 

= 

T3 


_y 3 '_ 


_2y 3 + 6xyi_ 


yi(0) = 2 y 1 (5)=y z (5)=0 

The program listed below is based on shoot 2 in Example 8.1. Because two of the 
three boundary conditions are specified at the right end, we start the integration at 
x = 5 and proceed with negative h toward x = 0. Two of the three initial conditions 
are prescribed as yi(5) = y 2 (5) = 0, whereas the third condition y 3 (5) is unknown. 
Because the differential equation is linear, the two guesses for y 3 (5) (iq and u 2 ) are 
not important; we left them as they were in Example 8.1. The adaptive Runge-Kutta 
method (runKut 5) was chosen for the integration. 


function shoot3 

% Shooting method for 3rd-order boundary value 
% problem in Example 8.3. 


global XSTART XSTOP H 

XSTART = 5; XSTOP = 0; 

H = -0.1; 

freq = 2; 

ul = 1; u2 = 2; 


% Make these params. global. 
% Range of integration. 

% Step size. 

% Frequency of printout. 

% Trial values of unknown 
% initial condition u. 


x = XSTART; 

u = linlnterp(@residual,ul,u2) ; 

[xSol.ySol] = runKut5(@dEqs,x,inCond(u),XSTOP,H); 
printSol(xSol,ySol,freq) 


function F = dEqs(x,y) % lst-order differential eqs. 
F = [y(2), y(3), 2*y(3) + 6*x*y(l)]; 


function y = inCond(u) 
y = [0 0 u]; 


% Initial conditions. 
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function r = residual(u) % Boundary residual, 
global XSTART XSTOP H 
x = XSTART; 

[xSol,ySol] = runKut5(@dEqs,x,inCond(u),XSTOP,H); 
r = ySol(size(ySol,1),1) - 2; 

We skip the rather long printout of the solution and show just the plot: 



Higher-Order Equations 

Consider the fourth-order differential equation 

y (4) = fix, y, y', y", y"’) (8.4a) 

with the boundary conditions 

y{a) = a 1 y’\a) = ol 2 y{b) = p , y"{b) = p 2 (8.4b) 

To solve Eq. (8.4a) with the shooting method, we need four initial conditions at jc = a, 
only two of which are specified. Denoting the two unknown initial values by U\ and 
u 2 , we have the set of initial conditions 

y(a) = a , y'{a) = Ui y"{a) = a 2 y"\a) = u 2 (8.5) 

If Eq. (8.4a) is solved with the shooting method using the initial conditions in 
Eq. (8.5), the computed boundary values at x = hdepend on the choice of iq and u 2 . 
We express this dependence as 

y[b) = 6 i{ui, u 2 ) y"{b) = 0 2 (ui,u 2 ) (8.6) 

The correct choice of iq and u 2 yields the given boundary conditions at jc = b; that is, 
it satisfies the equations 


0l (Mi, U 2 )=Pi 


0 2 {u\, u 2 ) = p 2 
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or, using vector notation 


0(u) = (3 


(8.7) 


These are simultaneous (generally nonlinear) equations that can be solved by the 
Newton-Raphson method discussed in Art. 4.6. It must be pointed out again that 
intelligent estimates of U\ and u 2 are needed if the differential equation is not linear. 


EXAMPLE 8.4 



The displacement v of the simply supported beam can be obtained by solving the 
boundary value problem 


d 4 v 

dx A 


w 0 x d 2 v „ , 

- v = —— = 0 at x = 0 and x = L 

ElL dx 2 


where El is the bending rigidity. Determine by numerical integration the slopes at 
the two ends and the displacement at mid-span. 


Solution Introducing the dimensionless variables 

_ x El 

^ ~ L y ~ w 0 L 4 


v 


the problem is transformed to 


W = i r=^7 = 0at « = °an d l = 1 

The equivalent first-order equations and the boundary conditions are (the prime 
denotes d/d%) 


~y[~ 


>2' 

y'i 


y 3 

y 3 ' 


yr 

_yi. 


J _ 


yi(0) =y 3 (0) = yi(l) = y 3 (l) =0 

The program listed below is similar to the one in Example 8.1. With appropriate 
changes in functions dEqs(x,y), inCond(u) and residual(u) the program can 
solve boundary value problems of any order greater than two. For the problem at 
hand we chose the Bulirsch-Stoer algorithm to do the integration because it gives us 
control over the printout (we need y precisely at mid-span). The nonadaptive Runge- 
Kutta method could also be used here, but we would have to guess a suitable step 
size h. 
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function shoot4 

% Shooting method for 4th-order boundary value 
% problem in Example 8.4. 


global XSTART XSTOP H 
XSTART = 0; XSTOP = 1; 
H = 0.5; 
freq = 1; 
u = [0 1] ; 


% Make these params. global. 


% Range of integration. 
% Step size. 


% Frequency of printout 
% Trial values of u(l). 
% and u(2). 


x = XSTART; 

u = newtonRaphson2(©residual,u) ; 

[xSol.ySol] = bulStoer(@dEqs,x,inCond(u),XSTOP,H); 
printSol(xSol,ySol,freq) 


function F = dEqs(x,y) % Differential equations. 
F = [y(2) y(3) y(4) x;]; 


function y = inCond(u) % Initial conditions; u(l) 
y= [0 u(l) 0 u(2)]; % and u(2) are unknowns. 


function r = residual(u) % Boundary residuals, 
global XSTART XSTOP H 
r = zeros(length(u),1); 
x = XSTART; 

[xSol,ySol] = bulStoer(@dEqs,x,inCond(u),XSTOP,H); 
lastRow = size(ySol,1); 
r(l)= ySol(lastRow,1); 
r(2) = ySol(lastRow,3); 

Here is the output: 

» x yl y2 y3 y4 

0.0000e+000 0.0000e+000 1.9444e-002 0.0000e+000 -1.6667e-001 

5.0000e-001 6.5104e-003 1.2150e-003 -6.2500e-002 -4.1667e-002 

1.0000e+000 -4.8369e-017 -2.2222e-002 -5.8395e-018 3.3333e-001 



Noting that 






306 


Two-Point Boundary Value Problems 


we obtain 


dv 

dx 


x=0 


= 19.444 x 10“ 3 


w 0 L 3 
El 


dv 


dx 


x=L 


= -22.222 x 1CT 3 


w 0 L 3 

El 


v\ x =o.5L = 6.5104 x 


10“ 3 


WqL 4 

El 


which agree with the analytical solution (easily obtained by direct integration of the 
differential equation). 


EXAMPLE 8.5 

Solve the nonlinear differential equation 

y C4) + ± 3 = o 
X 

with the boundary conditions 

y(0) = y'(0) = 0 y"(l) = 0 y"'(l) = 1 


and plot y vs. x. 


Solution Our first task is to handle the indeterminacy of the differential equation 
at the origin, where x = y = 0. The problem is resolved by applying L’Hospital’s rule: 
4 y 3 /x 12y 2 y'asx -> O.Thustheequivalentfirst-orderequationsandtheboundary 
conditions that we use in the solution are 


y = 


~y'i~ 

y ' 2 

.yL 


y 2 
ys 

T4 

— 12y 2 y 2 nearx = 0 
—4 y\/x otherwise 


yi(0)=y 2 (0) = 0 y 3 (l) = 0 y 4 (l) = 1 

Because the problem is nonlinear, we need reasonable estimates for y"(0) and 
y"'(0). On the basis of the boundary conditions y"(l) = 0 and y"'(l) = 1, the plot of 
y" is likely to look something like this: 



x 
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Ifwe are right, then y"(0) < 0andy'"(0) > 0. Based on this rather scanty information, 
wetryy"(0) = -1 andy"'(0) = 1. 

The following program uses the adaptive Runge-Kutta method (runKuts) for 
integration: 


function shoot4nl 

% Shooting method for nonlinear 4th-order boundary 
% value problem in Example 8.5. 


global XSTART XSTOP H 
XSTART = 0; XSTOP = 1; 
H = 0.1; 
freq = 1; 
u = [-1 1]; 


% Make these params. global. 
% Range of integration. 

% Step size. 

% Frequency of printout. 

% Trial values of u(l) 

% and u(2). 


x = XSTART; 

u = newtonRaphson2(©residual,u); 

[xSol,ySol] = runKut5(@dEqs,x,inCond(u),XSTOP,H); 
printSol(xSol,ySol,freq) 


function F = dEqs(x,y) % Differential equations. 
F = zeros(l,4); 

F(l) = y(2); F(2) = y(3); F(3) = y(4); 
if x < 10.0e-4; F(4) = -12*y(2)*y(l)~2; 
else; F(4) = -4*(y(l)"3)/x; 

end 


function y = inCond(u) % Initial conditions; u(l) 
y = [0 0u(l) u(2)]; % and u(2) are unknowns. 

function r = residual(u) % Bounday residuals, 
global XSTART XSTOP H 
r = zeros(length(u),1); 
x = XSTART; 

[xSol,ySol] = runKut5(@dEqs,x,inCond(u),XSTOP,H); 
lastRow = size(ySol,1); 
r(l) = ySolflastRow,3); 
r(2) = ySolflastRow,4) - 1; 
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The results are: 


» X 

0.0000e+000 
1.0000e-001 
3.9 5 76e-001 
7.0683e-001 
9.888 5e-001 
1.0000e+000 


yi 

0.0000e+000 
-4.7184e-003 
-6.6403e-002 
-1.8666e-001 
-3.2061e-001 
-3.2607e-001 


¥2 

0.0000e+000 
-9.2 750e-002 
-3.1022e-001 
-4.4722e-001 
-4.8968e-001 
-4.8975e-001 


y3 

-9.7607e-001 
-8.7893e-001 
- 5.9165e-001 
-2.8896e-001 
-1.1144e-002 
6.48 79e-016 


y4 

9.7131e-001 
9.7131e-001 
9.7152e-001 
9.762 7e-001 
9.9848e-001 
1.0000e+000 



X 


By good fortune, our initial estimates y"(0) = —1 andy'"(0) = 1 were very close to the 
final values. 

PROBLEM SET 8.1 

1. Numerical integration of the initial value problem 

y" + / - y = o y(0) = o y'( 0 ) = l 

yielded y(l) = 0.741028. What is the value of y'(0) that would result in y(l) = 1, 
assuming that y(0) is unchanged? 

2. The solution of the differential equation 

y'" + y" + 2y' = 6 

with the initial conditions y(0) = 2, y'(0) = 0 and y"(0) = 1, yielded y(l) = 
3.03765. When the solution was repeated with y"(0) = 0 (the other conditions 
being unchanged), the result was y(l) = 2.72318. Determine the value of y"(0) so 
thaty(l) = 0. 
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3. Roughly sketch the solution of the following boundary value problems. Use the 
sketch to estimate y'(0) for each problem. 

(a) y"=-e-y y(0) = 1 y(l) =0.5 

(b) y" = 4y 2 y(0) = 10 y'(l) = 0 

(c) y" = cos(jcy) y(0) = 1 y(l) = 2 

4. Using a rough sketch of the solution estimate of y(0) for the following boundary 
value problems. 


(a) 

y" 

= y 2 + xy 

o 

II 

o 

yd) = 2 

(b) 

y" 

2 / 2 
= —y - y 

X 

o 

II 

o 

yd) = 2 

(c) 

y" 

= -x[y') 2 

y'(0) = 2 

yd) = i 


5. Obtain a rough estimate of y"(0) for the boundary value problem 

y'" + 5 y"f = 0 

y(0) = 0 y'(0) = l y(i) = o 

6 . Obtain rough estimates of y" (0) and y(0) for the boundary value problem 

y (4) + 2y" + y'siny = 0 
y(0) = y'(0) = 0 y(l) = 5 y'(l) = 0 


7. Obtain rough estimates of i(0) and y(0) for the boundary value problem 


x + 2x 2 — y = 0 jc(0 ) = 1 x(l) = 0 
y + y 2 —2x=l y(0) = 0 y(l) = 1 

8 . ■ Solve the boundary value problem 


y" + (1 — 0.2x) y 2 = 0 y(0) = 0 y(jr/2) = 1 


9. ■ Solve the boundary value problem 

y" + 2y' + 3y 2 = 0 y(0) = 0 y(2) = — 1 

10. ■ Solve the boundary value problem 

y" + siny+l = 0 y(0) = 0 y(jr) = 0 
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11. ■ Solve the boundary value problem 

y" + \ y ' + y = 0 J / (°- 01 ) = 1 y'(2) = o 

and plot y vs. x. Warning-, y changes very rapidly near x = 0. 

12. ■ Solve the boundary value problem 

y" - (1 - e~ x ) y = 0 y(0) = 1 y(oo) = 0 

and plot y vs. x. Hint. Replace the infinity by a finite value /J. Check your choice of 
p by repeating the solution with 1.5/S. If the results change, you must increase f J >. 

13. ■ Solve the boundary value problem 

y'" = --y" + \y' + o.i(y ') 3 

X X 1 

y(l) = 0 y"(l) = 0 y(2) = 1 

14. ■ Solve the boundary value problem 

y "' + 4 y" + 6y' = 10 
y(0) = y"(0) = 0 y(3) — y'(3) = 5 

15. ■ Solve the boundary value problem 

y'" + 2y" + siny = 0 
y(— l) = o y'(—l) = —l y'(D = i 

16. ■ Solve the differential equation in Prob. 15 with the boundary conditions 


yC-1) = 0 y(0) = 0 y(l) = 1 

(this is a three-point boundary value problem). 

17. ■ Solve the boundary value problem 

y< 4 > = -xy 2 

y(0) = 5 y"(0) = 0 y'(l) = 0 y"'(l) = 2 

18. ■ Solve the boundary value problem 

y {4 > = —2yy" 


y(0) = y'(0) = 0 y(4) = 0 


y'(4) = l 
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19. ■ 



f = 0 8000 m f = 10 


s 


x 


A projectile of mass m in free flight experiences the aerodynamic drag force 
F d = cv 2 , where v is the velocity. The resulting equations of motion are 


x 


c 

- vx 

m 


y= ~m Vy - g 


v = 


x 2 +y 2 


If the projectile hits a target 8 km away after a 10 s flight, determine the launch 
velocity i > 0 and its angle of inclination 9. Use m = 20 kg, c = 3.2 x 10 -4 kg/m and 
g = 9.80665 m/s 2 . 


N 


f 


w 0 

Trunin n 

l ar* 


The simply supported beam carries a uniform load of intensity w 0 and the tensile 
force N. The differential equation for the vertical displacement v can be shown 
to be 

d 4 v N d 2 v Wq 
dx 4 El dx 2 El 


where El is the bending rigidity. The boundary conditions are v = d 2 v/dx 2 = 0 

x El 

at x = 0 and x = L. Changing the variables to § = — and y = —., v transforms 


the problem to the dimensionless form 


L 


w 0 L 4 


d 4 y _ <Py_ _ 

dk 4 p dt 2 


P = 


NL 2 

~eT 


d 2 y 


d 2 y 


yk=0- -TZzl -y l?=l - "772 


1^=0 




= 0 


t=i 


Determine the maximum displacement if (a) p = 1.65929 and (b) p = —1.65929 
[N is compressive). 

21. ■ Solve the boundary value problem 


y'" + yy" = 0 y(0) = y'm = 0, y'(oo) = 2 
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and plot y{x) and y' (x). This problem arises in determining the velocity profile of 
the boundary layer in incompressible flow (Blasius solution). 


8.3 Finite Difference Method 



Figure 8.1. Finite difference mesh 


In the finite difference method we divide the range of integration ( a , b ) into n— 1 
equal sub intervals of length h each, as shown in Fig. 8.1. The values of the numerical 
solution at the mesh points are denoted by y,, i = 1, 2 .... n; the two points outside 
{a, b) will be explained shortly. We then make two approximations: 


1. The derivatives of y in the differential equation are replaced by the finite difference 
expressions. It is common practice to use the first central difference approxima¬ 
tions (see Chapter 5): 


, y ; -+i - yi-i „ yt-i - 2 y t + y i+1 

‘ 2/2 y ‘ h 2 


etc. 


( 8 . 8 ) 


2. The differential equation is enforced only at the mesh points. 


As a result, the differential equations are replaced by n simultaneous algebraic 
equations, the unknowns being y,-, i = 1,2, n.Ifthe differential equation is nonlin¬ 
ear, the algebraic equations will also be nonlinear and must be solved by the Newton- 
Raphson method. 

Since the truncation error in a first central difference approximation is 0{h 2 ), 
the finite difference method is not as accurate as the shooting method—recall that 
the Runge-Kutta method has a truncation error of 0{h 5 ). Therefore, the convergence 
criterion in the Newton-Raphson method should not be too severe. 
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Second-Order Differential Equation 

Consider the second-order differential equation 

y" = fix, y, /) 

with the boundary conditions 

y(a) = a or y\a ) = a 
y(b) = p or y'(h) = p 


Approximating the derivatives at the mesh points by finite differences, the prob¬ 
lem becomes 


y<-i - 2 yt + yt+ 1 _ , / „ yt+1 - yt-i 

— j \ x i' yt > 


h 2 


2 h 


y 2 -yo 

Vi = a or --— = a 

* 2 h 


, i = 1, 2 ,..., n 


y n = P or 


Yn+l y,i-l 
2 h 


(8.9) 

(8.10a) 

(8.10b) 


Note the presence of yo and y n +\, which are associated with points outside the solution 
domain (a, b). This "spillover” can be eliminated by using the boundary conditions. 
But before we do that, let us rewrite Eqs. (8.9) as 

y2 - yo' 


y 0 - 2yi + y 2 - h z f(xuyi, 


= 0 


y/-i - 2y; + y,-+i - h / x t , y ; 


yi+ 1 - yt- 1 
2 h 


= 0, i = 2 ,3,..., n— 1 


y„_i - 2y„ + y n+1 - h 2 f x n , y n 


Yn+l y n —i 
2 h 


= 0 


(a) 

(b) 

(c) 


The boundary conditions on y are easily dealt with: Eq. (a) is simply replaced 
by yi — a = 0 and Eq. (c) is replaced by y n — ft = 0. If y' are prescribed, we obtain 
fromEqs. (8.10) y 0 = y 2 — 2 ha and y , 1+ 1 = y n _i + 2hfi, which are then substituted into 
Eqs. (a) and (c), respectively. Hence we finish up with n equations in the unknowns 
yt, i = 1,2 

yi — a. = 0 if y(n) = a 

—2yi + 2y 2 — h 2 f[x i, yi, a) — 2 ha = 0 ify'(a) = a 


(8.11a) 


yi -1 - 2y, + y i+ 1 - h 2 f\ x t , y t . 


y,-+ i - y,-_i 
2 h 


= 0 i = 2,3, 


,n— 1 


(8.11b) 


y n ~P = 0 if y[b) = p | 

2y„_i - 2y„ - h 2 f (x n , y„, P) + 2hp = 0 if y'(fo) = p J 


(8.11c) 
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EXAMPLE 8.6 

Write out Eqs. (8.11) for the following linear boundary value problem using n = 11: 

y" = -4y+4x y( 0) = 0 y'(jr/2) = 0 

Solve these equations with a computer program. 

Solution In this case a = 0 (applicable to y), j3 = 0 (applicable to y') and 
f[x , y , y') = —4y + 4 jc. Hence Eqs. (8.11) are 

yi = 0 

y,_i - 2y, + y i+ i - h 2 (-4y ; + 4*,) = 0, i = 2, 3,.... 10 
2y i0 - 2yn - h 2 {-4y u + 4x u ) = 0 
or, using matrix notation 


"1 0 

1 -2 + 4 h 2 1 



~y i " 
yi 


'0 

4 h 2 x 2 

1 

-2 + 4 h 2 1 

2 -2 + 4 h 2 _ 


l 

o 

1 _ 


4h 2 Xio 
_4h 2 X\\ _ 


Note that the coefficient matrix is tridiagonal, so that the equations can be solved 
efficiently by the functions LUdec3 and LUsol3 described in Art. 2.4. Recalling that 
these functions store the diagonals of the coefficient matrix in vectors c, d and e, we 
arrive at the following program: 

function fDiff6 

% Finite difference method for the second-order, 

% linear boundary value problem in Example 8.6. 

xStart = 0; xStop = pi/2; % Range of integration, 

n = 11 ; % Number of mesh points, 

freq =1; % Printout frequency. 

h = (xStop - xStart)/(n-l); 
x = linspace(xStart,xStop,n)’; 

[c,d,e,b] = fDiffEqs(x,h,n); 

[c,d,e] = LUdec3(c,d,e); 

printSol(x,LUsol3(c,d,e,b),freq) 
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function [c,d,e,b] = fDiffEqs(x,h,n) 

% Sets up the tridiagonal coefficient matrix and the 
% constant vector of the finite difference equations. 
h2 = h*h; 

d = ones(n,1)* (-2 + 4*h2); 
c = ones(n-l,1); 
e = ones(n-l,1); 
b = ones(n ,1)*4*h2.*x; 

d(1) = 1; e(1) = 0; b(l) = 0;c(n-l) = 2; 

The solution is 


» X 

0.0000e+000 
1.5708e-001 
3.1416e-001 
4.7124e-001 
6.2832e-001 
7.8540e-001 
9.4248e-001 
1.0996e+000 
1.2566e+000 
1.413 7e+000 
1.5708e+000 


Yi 

0.0000e+000 
3.1417e-001 
6.1284e-001 
8.8203e-001 
1.1107e+000 
1.2917e+000 
1.4228e+000 
1.5064e+000 
1.5500e+000 
1.5645e+000 
1.5642e+000 


The exact solution of the problem is 

y = x — sin 2 x 

which yields y( jt/ 2) = jr/2 = 1.57080. Thus the error in the numerical solution is 
about 0.4%. More accurate results can be achieved by increasing n. For example, with 
n= 101, we would get y(n/Z) = 1.57073, which is in error by only 0.0002%. 

EXAMPLE 8.7 

Solve the boundary value problem 

y" = -3 yy' y(0) = 0 y( 2) = 1 

with the finite difference method. (This problem was solved in Example 8.1 by the 
shooting method.) Use n = 11 and compare the results to the solution in Example 8.1. 

Solution As the problem is nonlinear, Eqs. (8.11) must be solved by the Newton- 
Raphson method. The program listed below can be used as a model for other 
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second-order boundary value problems. The subfunction residual(y) returns 
the residuals of the finite difference equations, which are the left-hand sides of 
Eqs. (8.11). The differential equation y" = f{x. y, y') is defined in the subfunction 
y2Prime. In this problem we chose for the initial solution y,- = 0.5x,-, which cor¬ 
responds to the dashed straight line shown in the rough plot of y in Example 8.1. 
Note that we relaxed the convergence criterion in the Newton-Raphson method to 
1.0 x 10 -5 , which is more in line with the truncation error in the finite difference 
method. 


function fDiff7 

% Finite difference method for the second-order, 

% nonlinear boundary value problem in Example 8.7. 

global N H X % Make these params. global. 

xStart = 0; xStop =2; % Range of integration. 

N = 11; % Number of mesh points, 

freq =1; % Printout frequency. 

X = linspace(xStart,xStop,N)’; 

y = 0.5*X; % Starting values of y. 

H = (xStop - xStart)/(N-l); 
y = newtonRaphson2(©residual,y,1.Oe-5); 
printSol(X,y,freq) 

function r = residual(y); 

% Residuals of finite difference equations (left-hand 

% sides of Eqs (8.11)). 

global N H X 

r = zeros(N,1); 

r(1) = y(l); r(N) = y(N) - 1; 

for i = 2:N-1 

r(i) = y(i-l) - 2*y(i) + y(i+l)... 

- H*H*y2Prime(X(i),y(i),(y(i+l) - y(i-l))/(2*H)); 

end 

function F = y2Prime(x,y,yPrime) 

% Second-order differential equation F = y’’. 

F = -3*y*yPrime; 
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X 

0.0000e+000 
2.0000e-001 
4.0000e-001 
6.0000e-001 
8.0000e-001 
1.0000e+000 
1.2000e+000 
1.4000e+000 
1.6000e+000 
1.8000e+000 
2.0000e+000 


Yl 

0.0000e+000 
3.0240e-001 
5.5450e-001 
7.3469e-001 
8.49 79e-001 
9.1813e-001 
9.5695e-001 
9.7846e-001 
9.9020e-001 
9.9657e-001 
1.0000e+000 


Here is the output from the program: 

» 


The maximum discrepancy between the above solution and the one in Exam¬ 
ple 8.1 occurs at x = 0.6. In Example 8.1 we have y(0.6) = 0.072187, so that the differ¬ 
ence between the solutions is 


0.073469 - 0.072187 
0.072187 


x 100% 


1 . 8 % 


As the shooting method used in Example 8.1 is considerably more accurate than the 
finite difference method, the discrepancy can be attributed to truncation errors in 
the finite difference solution. This error would be acceptable in many engineering 
problems. Again, accuracy can be increased by using a finer mesh. With n = 101 we 
can reduce the error to 0.07%, but we must question whether the tenfold increase in 
computation time is really worth the extra precision. 


Fourth-Order Differential Equation 

For the sake of brevity we limit our discussion to the special case where y' and y'" do 
not appear explicitly in the differential equation; that is, we consider 

y (4) = f{x, y, y") 

We assume that two boundary conditions are prescribed at each end of the solution 
domain (a, b). Problems of this form are commonly encountered in beam theory. 

Again we divide the solution domain into n — 1 intervals of length h each. Re¬ 
placing the derivatives of y by finite differences at the mesh points, we get the finite 
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difference equations 

yt -2 ~ 4y/_i + 6 y t - 4y i+ , + y i+2 _ f ( y_i - 2y + y i+ i > 

p - / [xt, y t , F , 

where i = 1,2,..., n. It is more revealing to write these equations as 

y_i -4y 0 + 6yi - 4y 2 + y 3 - h 4 f(x i,yi, —— ^ + ^ ) = 0 

yo - 4yi + 6y 2 - 4y 3 + y 4 - h 4 f(x 2 , y 2 , —— ^ + ^ ) = 0 

yi - 4y 2 + 6y 3 - 4y 4 + y 5 - h 4 f (x 3 , y 3 , —— ^ + ^ = 0 

y„_ 3 - 4y„_ 2 + 6y„_! - 4y„ + y„ +i - h 4 f ^x n -u y n - 1 , —— ^'~ 1 + ^ j = 0 (8.13d) 

y„_ 2 - 4y„_i + 6y„ - 4y„+i + y„ +2 - h 4 / y„, —— 2y« + y«+i j = 0 (g.l3e) 

We now see that there are four unknowns that lie outside the solution domain: y_i, y 0 , 
y„+i and y„ +2 . This “spillover” can be eliminated by applying the boundary conditions, 
a task that is facilitated by Table 8.1. 


( 8 . 12 ) 

(8.13a) 

(8.13b) 

(8.13c) 


Bound, cond. 

Equivalent finite difference expression 

y(fl) = a 
y'(fl) = a 
y"(fl) = a 
y"'(a) = a 

yi = a 

y 0 = y 2 - 2 ha 

y 0 = 2yi - y 2 + h 2 a 

y i = 2y 0 - 2y 2 + y 3 - 2h 3 a 

y{b) = p 
y'{b ) = P 
y"{b) = p 
y'"(b) = p 

y n = P 

y n + 1 = y«-i + 2 hp 

y n + 1 = 2y„ - y„_i + h 2 p 

y n +2 = 2y„+i - 2y„_i + y„_ 2 + 2 h 3 p 


Table 8.1 


The astute observer may notice that some combinations of boundary conditions 
will not work in eliminating the “spillover.” One such combination is clearly y {a) = a i 
and y'"{d) = a 2 . The other one is y'(fl) = a i and y"{a) = a 2 . In the context of beam 
theory, this makes sense: we can impose either a displacement y or a shear force EIy'" 
at a point, but it is impossible to enforce both of them simultaneously. Similarly, it 
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makes no physical sense to prescribe both the slope y' and the bending moment Ely" 
at the same point. 

EXAMPLE 8.8 

f 

| * d j-x 

The uniform beam of length L and bending rigidity El is attached to rigid supports 
at both ends. The beam carries a concentrated load P at its mid-span. If we utilize 
symmetry and model only the left half of the beam, the displacement v can be obtained 
by solving the boundary value problem 


v\ x=0 = 0 


dv 

= 0 

dv 

= 0 

d 3 v 
El —, 

dx 

x=0 

dx 

x=L/2 

dx 3 


= -P/2 


x=L/2 


Use the finite difference method to determine the displacement and the bending 
moment M= -El (d 2 u/dx 2 ) at the mid-span (the exact values are v = PL 3 /(192EI) 
and M = PL/8). 


Solution By introducing the dimensionless variables 


? = 


x 

L 


El 

y =PL 3V 


the problem becomes 



yliM> = o 



= o 


dy_ 

d% 


t=l/2 


= 0 


d 3 y I = _ l 

d % 3 \ i ==\/2 ^ 


We now proceed to writing Eqs. (8.13) taking into account the boundary condi¬ 
tions. Referring to Table 8.1, we obtain the finite difference expressions of the bound¬ 
ary conditions at the left end as y 3 = 0 and y 0 = y 2 . Hence Eqs. (8.13a) and (8.13b) 
become 


y 1 = 0 

(a) 

■4yi + 7y 2 - 4y 3 + y 4 = 0 

(b) 


Equation (8.13c) is 


y i - 4 y 2 + 6y 3 - 4y 4 + y 5 = 0 


(c) 
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At the mid-span the boundary conditions are equivalent to y n+ i = y„_i and 
y «+2 — 2y„ + i 2 y n ~\ T- yn—2 T" 2 h ( 1/2) 

Substitution into Eqs. (8.13d) and (8.13e) yields 


o 

II 

£ 

1 

7 

£ 

+ 

CM 

1 

£ 

(d) 

2y„_ 2 - 8y„_i + 6 y n = h 3 

(e) 


The coefficient matrix of Eqs. (a)-(e) can be made symmetric by dividing Eq. (e) by 2. 
The result is 


'10 0 


~y l 


0 

0 7-41 


T2 


0 

0-4 6-4 1 


T3 


0 

1-4 6-4 1 


y«-2 


0 

1-4 7-4 


y«-i 


0 

1 -4 3_ 


_y« _ 


_ 0.5/t 3 _ 


The above system of equations can be solved with the decomposition and back 
substitution routines in the functions LUdec5 and LUsol5—see Art. 2.4. Recall that 
these functions work with the vectors d, e and f that form the diagonals of upper the 
half of the coefficient matrix. The program that sets up and solves the equations is 

function fDiff8 

% Finite difference method for the 4th-order, 

% linear boundary value problem in Example 8.8. 

xStart = 0; xStop =0.5; % Range of integration, 

n = 21; % Number of mesh points, 

freq =1; % Printout frequency, 

h = (xStop - xStart)/(n-l); 
x = linspace(xStart,xStop,n)’; 

[d,e,f,b] = fDiffEqs(x,h,n); 

[d,e,f] = LUdec5(d,e,f); 

printSol(x,LUsol5(d,e,f,b),freq) 

function [d,e,f,b] = fDiffEqs(x,h,n) 

% Sets up the pentadiagonal coefficient matrix and the 
% constant vector of the finite difference equations. 
d = ones(n,1)*6; 
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e = ones(n-1,1)*(-4); 
f = ones(n-2,1); 
b = zeros(n,1); 

d(1) = 1; d(2) = 7; d(n-l) = 7; d(n) = 3; 
e(l) = 0; f(l) = 0; b(n) = 0.5*h~3; 

The last two lines of the output are 

» x yl 

4.7500e-001 5.1953e-003 

5.0000e-001 5.2344e-003 


Thus at the mid-span we have 


V\x=0.5L 


PL 3 4 5 

” eT 


Tlf=o.s = 5.2344 x IQ" 3 


PL 3 

~eT 


d * 1 2 v 

dx 2 


x=0.5L 


PL 3 / 1 d 2 y \ 

~ET\T 2 di 2 ; =05 J 


PL y m —\ 2y m + ym+ i 
£7 h 2 


PL (5.1953 - 2(5.2344) + 5.1953) x 10“ 3 
~ ~EI 0.025 2 

PL 

= -0.125 12- 


M\ x= o.5l = —El 


d 2 v 
dx 2 


f=0.5 


0.125 12 PL 


In comparison, the exact solution yields 


v\ x =o.5L = 5.2083 x 10 3 —- 

El 

M\ x= 0.51 = = 0.125 00 PL 


PROBLEM SET 8.2 

Problems 1-5 Use first central difference approximations to transform the boundary 
value problem shown into simultaneous equations Ay = b. 

1. y" = (2 + x)y, y(0) = 0, y'(l) = 5. 

2. y" = y + x 2 , y(0) = 0, y(l) = 1. 

3. y" = e~ x y', y(0) = 1, y(l) = 0. 

4. y (4) = y" - y, y(0) = 0, y'(0) = 1, y(l) = 0, y'(l) = -1. 

5. y (4) = — 9y + x, y(0) = y"(0) = 0, y'(l) = y'"(l) = 0. 
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Problems 6-10 Solve the given boundary value problem with the finite difference 
method using n = 21. 

6. ■ y" = xy, y(l) = 1.5 y(2) = 3. 

7. ■ y" + 2y' + y = 0, y(0) = 0, y(l) = 1. Exact solution is y = xe l ~ x . 

8. ■ x 2 y" + xy' + y = 0, y(l) = 0, y(2) = 0.638961. Exact solution is y= sin(lnx). 

9. ■y" = y 2 siny, y'(0) = 0, y(jr) = 1. 

10. ■ y" + 2y(2xy' + y) = 0, y(0) = 1/2, y'(l) = —2/9. Exact solution is y = 
(2 + x 2 )- 1 . 

11 . ■ 


Iq 

'0 




LI4 


T /2 ^ 


LI 4, 


The simply supported beam consists of three segments with the moments of 
inertia I 0 and I\ as shown. A uniformly distributed load of intensity w 0 acts over 
the middle segment. Modeling only the left half of the beam, we can show that 
the differential equation 


for the displacement v is 


d 2 v M 

dx 2 El 


d 2 v 

dx 2 


x 


w 0 L 2 

4EIq 


x 


L 


h 

X 2 

( X 

n 2- 

h 

l“ 2 < 

kL 

4 / 


in 0 < x < — 
4 



< x < 


L 

2 


Introducing the dimensionless variables 


? = 


x 

L 


y= 


Eh 

- jV 

w 0 L 4 



changes the differential equation to 


fy 

d% 2 


1 

4 






in 0 < § < 


1 

4 



1 

m - < f < 
4 s 


1 

2 


with the boundary conditions 

yl?=o 


dy\ 

l?=i/2 


= 0 
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Use the finite difference method to determine the maximum displacement of the 
beam using n = 21 and y = 1.5 and compare it with the exact solution 

61 w 0 L 4 

till .I x no i r* tj t 

9216 EIq 



The simply supported, tapered beam has a circular cross section. A couple of 
magnitude M 0 is applied to the left end of the beam. The differential equation for 
the displacement v is 


where 


Substituting 


d 2 v _ M _ Mo(l — x/L) 
dx? ~~E 7 ~ EI 0 {d/d 0 ) 4 


d = do 




x 

L 


In = 


64 



EIo 

y= ^ V 



1 

0 


changes the differential equation to 

= 1 

[1 + (3 - l )?] 4 


with the boundary conditions 


y U=o = y U=i = o 

Solve the problem with the finite difference method using 3 = 1.5 and n= 21; plot 
y vs. §. The exact solution is 

(3 + 23g-3^ 2 g 
y 6(1 + 3?-§) 2 + 33 

13. ■ Solve Example 8.4 by the finite difference method with n= 21. Hint: Compute 
the end slopes from the second noncentral differences in Tables 5.3. 

14. ■ Solve Prob. 20 in Problem Set 8.1 with the finite difference method. Use n = 21. 

15. ■ 

w Q 


V' L 
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The simply supported beam of length L is resting on an elastic foundation of 
stiffness A;N/m 2 . The displacement v of the beam due to the uniformly distributed 
load of intensity Wq N/m is given by the solution of the boundary value problem 


d 4 p , d 2 y 

£/—+ to=u,„, vl ^=— 2 


d 2 v 

= 1 ' l -‘ = s? 


= o 


The nondimensional form of the problem is 


where 


d 4 y , . d 2 y 

74 + yy — l. yl|=o — 


d$ 4 


dx 2 




= yk= 


d 2 y I 

dx 2 | f=1 




El 

V = - jV 

w 0 L 4 


Y = 


kL 4 

~E1 


Solve this problem by the finite difference method with y = 10 5 and plot y vs. 

16. ■ Solve Prob. 15 if the ends of the beam are free and the load is confined to the 
middle half of the beam. Consider only the left half of the beam, in which case 
the nondimensional form of the problem is 


d 4 y _ I 0 inO < § < 1/4 
d^ 4+Yy - \ 1 in 1/4 < f < 1/2 


d 2 y 

dp 


d 3 y dy 

_ d 3 y 

f =0 dt; 3 |£ =0 dtj 

5 = 1/2 d% 3 


= 0 


f=l /2 


17. ■ The general form of a linear, second-order boundary value problem is 

y" = r{x) + s(x)y + t(x)y' 
y{a) = a or y'[a) = a 
y{b) = /3 or y'{b) = p 

Write a program that solves this problem with the finite difference method for 
any user-specified r[x), s(x) and t{x). Test the program by solving Prob. 8 . 


MATLAB Functions 

MATLAB has only the following function for solution of boundary value problems: 

sol = bvp4c(@dEqs ,©residual, solinit) uses a high-order finite difference 
method with an adaptive mesh to solve boundary value problems. The out¬ 
put sol is a structure (a MATLAB data type) created by bvp4c. The first two 
input arguments are handles to the following user-supplied functions: 
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F = dEqs (x, y) specifies the first-order differential equations F(x, y) = y'. Both F 
and y are column vectors. 

r = residual(ya, yb) specifies all the applicable the boundary residuals y,(a) - 
a; and y, (b) — in a column vector r, where a,- and f} t are the prescribed bound¬ 
ary values. 

The third input argument s ol init is a structure that contains the x andy-values at 
the nodes of the initial mesh. This structure can be generated with MATLAB’s function 
bvpinit: 

solinit = bvpinit(xinit,@yguess) where xinit is a vector containing the x- 
coordinates of the nodes; yguess(x) is a user-supplied function that returns a 
column vector containing the trial solutions for the components of y. 

The numerical solution at user-defined mesh points can be extracted from the 
structure sol with the MATLAB function deval: 

y = deval(sol ,xmesh) where xmesh is an array containing the x-coordinates of 
the mesh points. The function returns a matrix with the zth row containing the 
values of y,- at the mesh points. 

The following program illustrates the use of the above functions in solving 
Example 8.1: 

function shoot2_matlab 

% Solution of Example 8.1 with MATLAB’s function bvp4c. 

xinit = linspace(0,2,11)’; 
solinit = bvpinit(xinit,©yguess); 
sol = bvp4c(@dEqs,©residual,solinit); 
y = deval(sol,xinit)’; 

printSol(xinit,y,1) % This is our own func. 

function F = dEqs(x,y) % Differential eqs. 

F = [y(2); -3*y(l)*y(2)]; 

function r = residual(ya,yb) % Boundary residuals, 
r = [ya(l); yb(l) - 1]; 

function yinit = yguess(x) % Initial guessses for 
yinit = [0 .5*x; 0.5]; % yl and y2. 




Symmetric Matrix Eigenvalue Problems 


Find A for which nontrivial solutions of Ax = Ax exist 


9.1 Introduction 


The standard form of the matrix eigenvalue problem is 

Ax = Ax (9.1) 

where A is a given n x n matrix. The problem is to find the scalar A and the vector x. 
Rewriting Eq. (9.1) in the form 

(A - AI) x = 0 (9.2) 

it becomes apparent that we are dealing with a system of n homogeneous equations. 
An obvious solution is the trivial one x = 0. A nontrivial solution can exist only if the 
determinant of the coefficient matrix vanishes; that is, if 


|A — AI] =0 


(9.3) 


Expansion of the determinant leads to the polynomial equation known as the 
characteristic equation 

aik n a2k fl ^ -E • ■ ■ ~E a n k a n +1 = 0 


which has the roots A,, i = 1, 2. n, called the eigenvalues of the matrix A. The 

solutions Xj of (A — A;I) x = 0 are known as the eigenvectors. 

As an example, consider the matrix 
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The characteristic equation is 


|A-AI| 


1 — A. -1 
-1 2 — A 

0 -1 


0 

-1 


= —3A + 4A 2 — A 3 = 0 


1 - A 


(b) 


The roots of this equation are Ai = 0, A 2 = 1, A 3 = 3. To compute the eigenvector 
corresponding the A 3 , we substitute A = A 3 into Eq. (9.2), obtaining 


—2 -1 0‘ 


X] 


"o’ 

-1 -1 -1 


x 2 

= 

0 

0 -1 -2 


_*3_ 


0 


We know that the determinant of the coefficient matrix is zero, so that the equations 
are not linearly independent. Therefore, we can assign an arbitrary value to any one 
component of x and use two of the equations to compute the other two components. 
Choosingxi = 1, the first equation of Eq. (c) yields.^ = -2 and from the third equation 
we get jc 3 = 1. Thus the eigenvector associated with A 3 is 


x 3 = 


1 

-2 

1 


The other two eigenvectors 



r 


"r 

x 2 = 

0 

Xi = 

i 


-i 


i 


can be obtained in the same manner. 

It is sometimes convenient to display the eigenvectors as columns of a matrix X. 
For the problem at hand, this matrix is 


X = [xi x 2 x 3 j 


1 1 1 

1 0 -2 

1 -1 1 


It is clear from the above example that the magnitude of an eigenvector is indeter¬ 
minate; only its direction can be computed from Eq. (9.2). It is customary to normalize 
the eigenvectors by assigning a unit magnitude to each vector. Thus the normalized 
eigenvectors in our example are 



"l/V3 

1/V2 

l/Vef 

x = 

1/V3 

0 

-2/V6 


1/V3 

-1/V2 

1/V6_ 


Throughout this chapter we assume that the eigenvectors are normalized. 
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Here are some useful properties of eigenvalues and eigenvectors, given without 
proof: 

• All eigenvalues of a symmetric matrix are real. 

• All eigenvalues of a symmetric, positive-definite matrix are real and positive. 

• The eigenvectors of a symmetric matrix are orthonormal; that is, X 7 X = I. 

• If the eigenvalues of A are A,-, then the eigenvalues of A -1 are A a 1 . 

Eigenvalue problems that originate from physical problems often end up with 
a symmetric A. This is fortunate, because symmetric eigenvalue problems are much 
easier to solve than their nonsymmetric counterparts. In this chapter we largely restrict 
our discussion to eigenvalues and eigenvectors of symmetric matrices. 

Common sources of eigenvalue problems are the analysis of vibrations and sta¬ 
bility. These problems often have the following characteristics: 

• The matrices are large and sparse (e.g., have a banded structure). 

• We need to know only the eigenvalues; if eigenvectors are required, only a few of 

them are of interest. 

A useful eigenvalue solver must be able to utilize these characteristics to minimize 
the computations. In particular, it should be flexible enough to compute only what 
we need and no more. 


9.2 Jacobi Method 

Similarity Transformation and Diagonalization 

Consider the standard matrix eigenvalue problem 

Ax = Ax (9.4) 

where A is symmetric. Let us now apply the transformation 

x = Px* (9.5) 

where Pis a nonsingular matrix. Substituting Eq. (9.5) intoEq. (9.4) and premultiplying 
each side by P 1 , we get 

P _1 APx* = AP _1 Px* 
or 


A*x* = Ax* 


(9.6) 
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where A* = P 1 AP. Because X was untouched by the transformation, the eigenval¬ 
ues of A are also the eigenvalues of A*. Matrices that have the same eigenvalues are 
deemed to be similar, and the transformation between them is called a similarity 
transformation. 

Similarity transformations are frequently used to change an eigenvalue problem 
to a form that is easier to solve. Suppose that we managed by some means to find a P 
that diagonalizes A*, so that Eqs. (9.6) are 


0 

O 


~xf 


O 

0 a* 2 - X ■ 

O • ■ 


y* 

-*2 

= 

■ • o 

0 0 

• A* m -X_ 


1 

A 

1_ 


-1 

■ • o 
_1 


The solution of these equations is 

M = ^ ^2 = a; 2 ■■■ X n =A* nn (9.7) 


I— 1 


o 


o 

0 

* 

x 2 = 

1 

II 

*x c 

0 

1 

O ■ 
1_ 


1 

O ' 
1 _ 


1 

H ' 

1_ 


X* = [ x i X* ... x*]=I 

According to Eq. (9.5) the eigenvector matrix of A is 

X=PX* = PI = P (9.8) 

Hence the transformation matrix P is the eigenvector matrix of A and the eigenvalues 
of A are the diagonal terms of A*. 


Jacobi Rotation 

A special transformation is the plane rotation 


x = Rx 


(9.9) 
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where 


k l 

"l 0 0 0 0 0 0 o' 

0 1 0 0 0 0 0 0 

00 c OOsOO 

R _00 0 10000 

“00 0 01000 

00—sOOcOO 
0 0 0 0 0 0 1 0 

0 0 0 0 0 0 0 1 

is called the Jacobi rotation matrix. Note that R is an identity matrix modified by the 
terms c = cos 9 and s = sin# appearing at the intersections of columns/rows k and 
l, where 0 is the rotation angle. The rotation matrix has the useful property of being 
orthogonal, or unitary, meaning that 

R _1 = R r (9.11) 

One consequence of orthogonality is that the transformation in Eq. (9.9) has the 
essential characteristic of a rotation: it preserves the magnitude of the vector; that is, 
|x| = |x*|. 

The similarity transformation corresponding to the plane rotation in Eq. (9.9) is 

A* = R 1 AR = R r AR (9.12) 

The matrix A* not only has the same eigenvalues as the original matrix A, but due to 
orthogonality of R it is also symmetric. The transformation in Eq. (9.12) changes only 
the rows/columns fraud i of A. The formulas for these changes are 

A* k ± = c 2 A kk + s 2 Au — 2csA ki 

Af( = c 2 Au + s 2 A kk + 2 csA k i 

Alt = A* ik = (c 2 - s 2 )A kt + cs{A kk - A u ) (9.13) 

A* ki = A* k = cA ki - sA ti , i =£ k, i / i 
An = A* u = cAu + sA k i, i =£ k, i ^ i 

Jacobi Diagonalization 

The angle 6 in the Jacobi rotation matrix can be chosen so that A* k( = A* tk = 0. This sug¬ 
gests the following idea: why not diagonalize A by looping through all the off-diagonal 
terms and eliminate them one by one? This is exactly what Jacobi diagonalization does. 


(9.10) 
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However, there is a major snag—the transformation that annihilates an off-diagonal 
term also undoes some of the previously created zeroes. Fortunately, it turns out that 
the off-diagonal terms that reappear will be smaller than before. Thus Jacobi method 
is an iterative procedure that repeatedly applies Jacobi rotations until the off-diagonal 
terms have virtually vanished. The final transformation matrix P is the accumulation 
of individual rotations R, : 


P = RiR 2 R 3 (9.14) 

The columns of P finish up being the eigenvectors of A and the diagonal elements of 
A* = P r AP become the eigenvectors. 

Let us now look at the details of a Jacobi rotation. From Eq. (9.13) we see that 
Ale = 0 if 

(c“ — s 2 )At£ + cs(Afcfc — Aw) = 0 (a) 

Using the trigonometric identities c 2 — s 2 = cos 2 8 and cs = (1/2) sin 28, we obtain 
from Eq. (a) 

tan 2 8 =-— (b) 

Akk — Au 

which could be solved for 8, followed by computation of c = cos 8 and s = sin 8. How¬ 
ever, the procedure described below leads to a better algorithm. 20 
Introducing the notation 

Ai-ic — Apr 

0 = cot20 =- * , (9.15) 

2 Ajd 

and utilizing the trigonometric identity 

2 t 

tan 2 8 =-— 

(1 - t 2 ) 

where t = tan 8, we can write Eq. (b) as 

t 2 + 2(j>t— 1 = 0 


which has the roots 

t = —0 =L yj(f) 2 + 1 

It has been found that the root \t\ < 1, which corresponds to \8\ < 45°, leads to the 
more stable transformation. Therefore, we choose the plus sign if 0 > 0 and the minus 

20 The procedure is adapted from W.H. Press et al., Numerical Recipes in Fortran, 2nd ed. (1992), 
Cambridge University Press. 
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sign if 0 < 0, which is equivalent to using 


t = sgn (0) 


101 + 



To forestall excessive roundoff error if cp is large, we multiply both sides of the equation 
by \<j>\ + s/(f > 2 + 1 and solve for t, which yields 


sgn(0) 

101 + V<P 2 + 1 


(9.16a) 


In the case of very large tj>, we should replace Eq. (9.16a) by the approximation 


t = 


1 

24 , 


(9.16b) 


to prevent overflow in the computation of <j > 2 . Having computed t, we can use the 
trigonometric relationship tan 9 = sin 9 / cos 9 = Vl — cos 2 9 / cos 9 to obtain 

c = 1 — s = tc (9.17) 

71+72 

We now improve the computational properties of the transformation formulas in 
Eqs. (9.13). Solving Eq. (a) for A u , we obtain 


c 2 — s 2 

All = 7l hie + A ki - (c) 

cs 

Replacing all occurrences of A u by Eq. (c) and simplifying, we can write the transfor¬ 
mation formulas in Eqs. (9.13) as 


Atfc — Aide tAjd 
Au = Au + tAid 

Ah = A* k = 0 (9.18) 

A* k i = A* k =A ki -s{A ei + rA ki ), i / k, i l 
Api — Ajp — An .s'( A k i x An), i ^ lc y i ^ £ 


where 


s 

r = - 

1 + c 


(9.19) 


The introduction of r allowed us to express each formula in the form (original 
value) + (change), which is helpful in reducing the roundoff error. 

At the start of Jacobi’s diagonalization process the transformation matrix P is 
initialized to the identity matrix. Each Jacobi rotation changes this matrix from P 
to P* = PR. The corresponding changes in the elements of P can be shown to be 
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(only the columns k and l are affected) 

P* k = P ik -s(P ie + rP ik ) (9.20) 

P*i = P it + s ( p ik ~ tPil) 

We still have to decide the order in which the off-diagonal elements of A are to be 
eliminated. Jacobi’s original idea was to attack the largest element since this results 
in fewest number of rotations. The problem here is that A has to be searched for 
the largest element after every rotation, which is a time-consuming process. If the 
matrix is large, it is faster to sweep through it by rows or columns and annihilate 
every element above some threshold value. In the next sweep the threshold is lowered 
and the process repeated. We adopt Jacobi’s original scheme because of its simpler 
implementation. 

In summary, the Jacobi diagonalization procedure, which uses only the upper half 
of the matrix, is 

1. Find the largest (absolute value) off-diagonal element Am in the upper half of A. 

2. Compute 0, t, c and s from Eqs. (9.15)—(9.17). 

3. Compute r from Eq. (9.19). 

4. Modify the elements in the upper half of A according to Eqs. (9.18). 

5. Update the transformation matrix P using Eqs. (9.20). 

Repeat the procedure until the Am < s, where e is the error tolerance. 


■ j acobi 

The function j acobi computes all eigenvalues A.,- and eigenvectors x,- of a symmetric, 
n x n matrix A by the Jacobi method. The algorithm works exclusively with the upper 
triangular part of A, which is destroyed in the process. The principal diagonal of A is 
replaced by the eigenvalues, and the columns of the transformation matrix P become 
the normalized eigenvectors. 

function [eVals,eVecs] = jacobi(A,tol) 

% Jacobi method for computing eigenvalues and 
% eigenvectors of a symmetric matrix A. 

% USAGE: [eVals,eVecs] = jacobi(A,tol) 

% tol = error tolerance (default is 1.0e-9). 

if nargin < 2; tol = 1.0e-9; end 
n = size(A,1); 

maxRot = 5*(n~2); % Limit number of rotations 
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P = eye(n); 
for i = l:maxRot 


% Initialize rotation matrix 
% Begin Jacobi rotations 


[Amax,k,L] = maxElem(A); 


if Amax < tol; 

eVals = diag(A); eVecs = P; 
return 


end 


[A,P] = rotate(A,P,k,L); 

end 

error(’Too many Jacobi rotations’) 
function [Amax,k,L] = maxElem(A) 

% Finds Amax = A(k,L) (largest off-diag. elem. of A), 
n = size(A,1); 

Amax = 0; 
for i = l:n-l 

for j = i+l:n 


if abs(A(i,j)) >= Amax 
Amax = abs(A(i,j)); 
k = i; L = j; 


end 

end 

end 

function [A,P] = rotate(A,P,k,L) 

% zeros A(k,L) by a Jacobi rotation and updates 
% transformation matrix P. 
n = size(A,1); 
diff = A(L,L) - A(k,k); 
if abs(A(k,L)) < abs(diff)*1.Oe-36 
t = A(k,L); 

else 

phi = diff/(2*A(k,L)); 
t = l/(abs(phi) + sqrt(phi'2 + 1)); 
if phi <0; t = -t; end; 

end 

c = l/sqrt(t~2 +1); s = t*c; 

tau = s/(l + c); 

temp = A(k,L); A(k,L) = 0; 

A(k,k) = A(k,k) - t*temp; 
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A(L,L) = A(L,L) + t*temp; 

for i = l:k-l % For i < k 

temp = A(i,k); 

A(i,k) = temp -s*(A(i,L) + tau*temp); 

A(i,L) = A(i,L) + s*(temp - tau*A(i,L)); 

end 

for i = k+l:L-l % For k < i < L 

temp = A(k,i); 

A(k,i) = temp - s*(A(i,L) + tau*A(k,i)); 

A(i,L) = A(i,L) + s*(temp - tau*A(i,L)); 

end 

for i = L+l:n % For i > L 

temp = A(k,i); 

A(k,i) = temp - s*(A(L,i) + tau*temp); 

A(L,i) = A(L,i) + s*(temp - tau*A(L,i)); 

end 

for i = l:n % Update transformation matrix 

temp = P(i,k); 

P(i,k) = temp - s*(P(i,L) + tau*P(i,k)); 

P(i,L) = P(i,L) + s*(temp - tau*P(i,L)); 

end 

■ sortEigen 

The eigenvalues/eigenvectors returned by j acobi are not ordered. The function listed 
below can be used to sort the results into ascending order of eigenvalues. 

function [eVals,eVecs] = sortEigenfeVals,eVecs) 

% Sorts eigenvalues & eigenvectors into ascending 
% order of eigenvalues. 

% USAGE: [eVals,eVecs] = sortEigenfeVals,eVecs) 

n = length(eVals); 
for i = l:n-l 

index = i; val = eVals(i); 
for j = i+l:n 

if eVals(j) < val 

index = j; val = eVals(j); 

end 


end 
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if index ~ = i 

eVals = swapRows(eVals,i,index); 
eVecs = swapCols(eVecs,i,index); 

end 

end 


Transformation to Standard Form 

Physical problems often give rise to eigenvalue problems of the form 

Ax = ABx (9.21) 

where A and B are symmetric n x n matrices. We assume that B is also positive definite. 
Such problems must be transformed into the standard form before they can be solved 
by Jacobi diagonalization. 

As B is symmetric and positive definite, we can apply Choleski’s decomposition 
B = LL r , where L is a lower-triangular matrix (see Art. 2.3). Then we introduce the 
transformation 

x=(IT 1 2 3 ) 7 z (9.22) 


Substituting into Eq. (9.21), we get 

A(L -1 ) r z =m/(L _1 ) r z 
Premultiplying both sides by L 1 results in 

L -1 A(L _1 ) r z = kL _1 LL r (L _1 ) r z 

Because L 1 L = L r (L -1 ) r = I, the last equation reduces to the standard form 


Hz = Xz 


(9.23) 


where 


H = L A(L 


-larr -1 


(9.24) 


An important property of this transformation is that it does not destroy the symmetry 
of the matrix; i.e., a symmetric A results in a symmetric H. 

Here is the general procedure for solving eigenvalue problems of the form 
Ax = ABx: 


1. Use Choleski’s decomposition B = LL r to compute L. 

2. Compute L -1 (a triangular matrix can be inverted with relatively small computa¬ 
tional effort). 

3. Compute H from Eq. (9.24). 
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4. Solve the standard eigenvalue problem Hz = Xz (e.g., using the Jacobi method). 

5. Recover the eigenvectors of the original problem from Eq. (9.22): x= (L _1 ) r z. 
Note that the eigenvalues were untouched by the transformation. 


An important special case is where B is a diagonal matrix: 


B = 


Pi o 

0 P 2 


0 

0 


Lo o ... p n J 


Here 



[P \ /2 

0 

0 


\Pi V2 

0 

0 

L = 

0 

Pi ' 2 ■ 

0 

L -1 = 

0 

p - 2 v2 ■ 

0 


0 

0 

* 

ho 

1_ 


0 

0 

- 1 

’ 1 K 


and 


Hij = Aj (Ptfij) 1/2 


(9.25) 


(9.26a) 


(9.26b) 


■ stdForm 

Given the matrices A and B, the function stdForm returns H and the transformation 
matrix T = (L -1 ) r . The inversion of L is carried out by the subfunction invert (the 
triangular shape of L allows this to be done by back substitution). 


function [H,T] = stdForm(A,B) 

% Transforms A*x = lambda*B*x to H*z = lambda*z 
% and computes transformation matrix T in x = T*z. 
% USAGE: [H,T] = stdForm(A,B) 


n = size(A,1); 

L = choleski(B); Linv = invert(L); 
H = Linv*(A*Linv'); T = Linv’; 


function Linv = invert(L) 

% Inverts lower triangular matrix L. 
n = size(L,1); 
for j = l:n-l 

L(j , j) = 1/L(j,j); 











338 


Symmetric Matrix Eigenvalue Problems 


for i = j+l:n 

L(i,j) = -dot(L(i,j:i-l), L(j:i-1,j)/L(i,i)); 

end 

end 

L(n,n) = 1/L(n,n); Linv = L; 


EXAMPLE 9.1 


40 MPa 



80 MPa 



The stress matrix (tensor) corresponding to the state of stress shown is 


80 30 0 

S = 30 40 0 MPa 

0 0 60 


(each row of the matrix consists of the three stress components acting on a coordinate 
plane). It can be shown that the eigenvalues of S are the principal stresses and the 
eigenvectors are normal to the principal planes. (1) Determine the principal stresses 
by diagonalizing S with a Jacobi rotation and (2) compute the eigenvectors. 

Solution of Part (1) To eliminate S 12 we must apply a rotation in the 1-2 plane. With 
k = 1 and 1 = 2 Eq. (9.15) is 


Sn - S 22 _ 80 - 40 _ 2 

2 S 12 _ 2(30) ~ _ 3 


Equation (9.16a) then yields 



According to Eqs. (9.18), the changes in S due to the rotation are 


= Sn - tS 12 = 80 - (-0.53518) (30) = 96.055 MPa 
s * 2 = S 22 + ts 12 = 40 + (-0.53518) (30) = 23.945 MPa 
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Hence the diagonalized stress matrix is 


S* 


96.055 0 0 

0 23.945 0 

0 0 60 


where the diagonal terms are the principal stresses. 


Solution of Part (2) To compute the eigenvectors, we start with Eqs. (9.17) and (9.19), 
which yield 


1 

s/Y+i 2 


1 

Jl + (—0.53518) 2 


0.88168 


s = 

r = 


tc = (-0.53518) (0.88168) = -0.47186 
s -0.47186 
1 + c ~ 1 + 0.88168 


= -0.25077 


We obtain the changes in the transformation matrix P from Eqs. (9.20). Because P is 
initialized to the identity matrix {Pa = 1 and Py = 0, i / j) the first equation gives us 


P*i = Pn — S(P\2 + t P\ i) 

= 1 - (-0.47186) [0 + (-0.25077) (1)] = 0.88167 

P 2 1 = Pa ~ s[P 22 + rPa) 

= 0 - (-0.47186) [1 + (-0.25077) (0)] = 0.47186 


Similarly, the second equation of Eqs. (9.20) yields 

P* 2 = -0.47186 Pi j = 0.88167 


The third row and column of P are not affected by the transformation. Thus 


p* 


0.88167 -0.47186 0 
0.47186 0.88167 0 

0 0 1 


The columns of P* are the eigenvectors of S. 


EXAMPLE 9.2 


L L 2L 


lOOOOOz 

— Mm, — 

■ cQQQQQ^ • 

o 1 - 

co • 

_(^2 C 

_fi 3 " 

hj 


^3 
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(1) Show that the analysis of the electric circuit shown leads to a matrix eigenvalue 
problem. (2) Determine the circular frequencies and the relative amplitudes of the 
currents. 


Solution of Part (1) Kirchoff’s equations for the three loops are 

; di\ h ~ ^2 _ „ 
dt 3 C 

r dh , <?2 - cj\ , q 2 - _ n 

dt + 3C + C - ° 

dfa ct?. — ci 2 ^ 

2L i + n u + JA = o 

dt C C 

Differentiating and substituting dcj k /cU = 4, we get 

1. 1. rr d 2 h 

3 3 dt 2 

1 . 4. . ^d 2 i 2 

~3 h+ 3 l2 ~ l3 = ~ LC ^ 


— i 2 -f- 2/3 = —2 LC 
These equations admit the solution 

4 (f) = Uk sin cot 


d 2 h 

dt 2 


where <>> is the circular frequency of oscillation (measured in rad/s) and u k are the 
relative amplitudes of the currents. Substitution into Kirchoff’s equations yields 
Au = ^Bu (sin cot cancels out), where 


1/3 

-1/3 

0" 


"l 

0 

0“ 

-1/3 

4/3 

-1 

B = 

0 

1 

0 

0 

-1 

2 


0 

0 

2 


which represents an eigenvalue problem of the nonstandard form. 


Solution of Part (2) Since B is a diagonal matrix, we can readily transform the problem 
into the standard form Hz = kz. From Eq. (9.26a) we get 


L 1 


1 0 0 
0 1 0 
0 0 1/V2 


and Eq. (9.26b) yields 


H = 


1/3 

-1/3 

0 


-1/3 0 

4/3 -1/V2 
-1/V2 1 
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The eigenvalues and eigenvectors of H can now be obtained with the Jacobi method. 
Skipping the details, we obtain the following results: 

A.i = 0.14779 X 2 = 0.58235 k 3 = 1.93653 



" 0 . 81027 " 


0.56274" 


0.16370" 

Zi = 

0.45102 

z 2 = 

-0.42040 

z 3 = 

-0.78730 


0.37423 


-0.71176 


0.59444 


The eigenvectors of the original problem are recovered from Eq. (9.22): y,-= (L 1 ) r z 
which yields 


"0.81027" 


0.56274" 


0.16370“ 

0.45102 

u 2 = 

-0.42040 

u 3 = 

-0.78730 

0.26462 


-0.50329 


0.42033 


These vectors should now be normalized (each z,- was normalized, but the transfor¬ 
mation to u, does not preserve the magnitudes of vectors). The circular frequencies 
are = VW (LC), so that 

0.3844 0.7631 1.3916 

CO 1 = — , CO? — — , co? = . 

vie vie vie 

EXAMPLE 9.3 



The propped cantilever beam carries a compressive axial load P. The lateral displace¬ 
ment u{x) of the beam can be shown to satisfy the differential equation 

i/ C4) + — iz" = 0 (a) 

El 

where El is the bending rigidity. The boundary conditions are 

u(0) = u"{ 0) = 0 u[L) = u'(L) = 0 (b) 

(1) Show that displacement analysis of the beam results in a matrix eigenvalue 
problem if the derivatives are approximated by finite differences. (2) Use the 
Jacobi method to compute the lowest three buckling loads and the corresponding 
eigenvectors. 

Solution of Part (1) We divide the beam into n- 1-1 segments of length L/[n+ 1) 
each as shown and enforce the differential equation at nodes 1 to n. Replacing the 
derivatives of u in Eq. (a) by central finite differences of 0[h 2 ) at the interior nodes 
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(nodes 1 to n), we obtain 

Ui -2 - 4w,_i + 6 Ui - 4U; + i + u i+2 
h 4 

P -Ui -1 + 2M; - Mi+i . 

=--, 1 = 1,2, ...,n 

El h 2 

After multiplication by ft 4 , the equations become 

m_i — 4m 0 + 6mi — 4m 2 + m 3 = X(—u 0 + 2mi — m 2 ) 
m 0 - 4mi + 6m 2 - 4m 3 + m 4 = >,(—Mi + 2 m 2 — m 3 ) 


(c) 


m„_ 3 - 4m„_ 2 + 6m„_i - 4m„ + M „+1 = H-Un -2 + 2u n -i - u n ) 


Un— 2 4m„_i + 6m h 4m„ + i U n+2 — f - ( M,,_ | -(- 2 l.l,i ll tj +1) 


where 


P/t 2 _ PL 2 
~eT ~ (n+ \) 2 EI 


The displacements m_i, Mo, m„ + i and m„ +2 can be eliminated by using the prescribed 
boundary conditions. Referring to Table 8.1, we obtain the finite difference approxi¬ 
mations to the boundary conditions in Eqs. (b): 


Mq — 0 M_i — Mi M, i+ 1 — 0 M„ +2 — M f i 


Substitution into Eqs. (c) yields the matrix eigenvalue problem Ax = /.Bx, where 


5-4 1 0 0 ■ ■ • O' 

-4 6-4 1 0 0 

1-4 6-4 1 0 

0 ... 1-4 6-4 1 
0 0 1-4 6-4 

0 0 0 1-4 7 


2 -1 0 0 0 O' 

-1 2-100 ••• 0 

0-1 2-1 0 ••• 0 


B = 


0 

0 

0 


0-1 2-1 0 
0 0-1 2-1 
000-12 
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Sol ution of Part (2) The problem with the Jacobi method is that it insists on finding all 
the eigenvalues and eigenvectors. It is also incapable of exploiting banded structures 
of matrices. Thus the program listed below does much more work than necessary for 
the problem at hand. More efficient methods of solution will be introduced later in 
this chapter. 

% Example 9.3 (Jacobi method) 

n = 10; % Number of interior nodes. 

A = zeros(n); B = zeros(n); % Start constructing A and B. 

for i = l:n 

A(i,i) = 6; B(i,i) = 2; 

end 

A(1,1) = 5; A(n,n) = 7; 
for i = l:n-l 

A(i,i+1) = -4; A(i+1,i) = -4; 

B(i,i+1) = -1; B(i+1,i) = -1; 

end 

for i = l:n-2 

A(i,i+2) = 1; A(i+2,i) = 1; 

end 

[H,T] = stdEorm(A,B); % Convert to std. form. 

[eVals,Z] = jacobi(H); % Solve by Jacobi method. 

X = T*Z; % Eigenvectors of orig. prob. 

for i = l:n % Normalize eigenvectors. 

xMag = sqrt(dot(X(:,i),X(:,i))); 

X(:,i) = X(:,i)/xMag; 

end 

[eVals,X] = sortEigen(eVals,X); % Sort in ascending order, 
eigenvalues = eVals(l:3)’ % Extract 3 smallest 

eigenvectors = X(:,1:3) % eigenvalues & vectors. 

Running the program resulted in the following output: 


» eigenvalues = 


0.1641 

eigenvectors 

0.4720 

0.9022 

0.1641 

-0.1848 

0.3070 

0.3062 

-0.2682 

0.3640 

0.4079 

-0.1968 

0.1467 

0.4574 

0.0099 

-0.1219 
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0.4515 

0.2685 

-0.1725 

0.3961 

0.4711 

0.0677 

0.3052 

0.5361 

0.4089 

0.1986 

0.4471 

0.5704 

0.0988 

0.2602 

0.4334 

0.0270 

0.0778 

0.1486 


The first three mode shapes, which represent the relative displacements of the 
buckled beam, are plotted below (we appended the zero end displacements to the 
eigenvectors before plotting the points). 



The bucklingloads are given by P; = [n+ l) 2 XiEI/L 2 . Thus 

(ll) 2 (0.1641) El El 

ft = - L 2 = 1986 I2 

(ll) 2 (0.4720) El El 

p 2 = - Jz -= 5711 I^ 

(ll) 2 (0.9022) El El 

ft = - \ 2 = 109 . 2 -^ 

The analytical values are Pi = 20.19 EI/L 2 , P 2 = 59.68 EI/L 2 and P 3 = 118.9 EI/L 2 . It 
can be seen that the error introduced by the finite element approximation increases 
with the mode number (the error in P !+ i is larger than in P, ). Of course, the accuracy 
of the finite difference model can be improved by using larger n, but beyond n = 20 
the cost of computation with the Jacobi method becomes rather high. 


9.3 Inverse Power and Power Methods 
Inverse Power Method 

The inverse power method is a simple iterative procedure for finding the smallest 
eigenvalue X : and the corresponding eigenvector xi of 


Ax = /.x 


(9.27) 
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The method works like this: 

1. Let v be an approximation to xi (a random vector of unit magnitude will do). 

2. Solve 


Az = v 


(9.28) 


for the vector z. 

3. Compute |z|. 

4. Let v = z/|z| and repeat steps 2-4 until the change in v is negligible. 

At the conclusion of the procedure, |z| = ±1/Ai and v = Xi. The sign of Xi is de¬ 
termined as follows: if z changes sign between successive iterations, A.i is negative; 
otherwise, Xi is positive. 

Let us now investigate why the method works. Since the eigenvectors x,- of 
Eq. (9.27) are orthonormal, they can be used as the basis for any ^-dimensional vector. 
Thus v and z admit the unique representations 


n n 


v=£ ViXi Z=£ Z/Xj 


(a) 


i'=l i= 1 


Note that u,- and z* are not the elements of v and z, but the components with respect 
to the eigenvectors x,-. Substitution into Eq. (9.28) yields 


n 


n 


a y z^i - y u i x i = 0 


i=l 


i =1 


But AXj = kjXi, so that 


n 


(ZiU - Vi) Xi = 0 


Hence 



It follows from Eq. (a) that 




(9.29) 


Since |ki/ki| < 1 (z / 1), we observe that the coefficient of xi has become more promi¬ 
nent in z than it was in v; hence z is a better approximation to xi. This completes the 
first iterative cycle. 
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In subsequent cycles we set v = z/|z| and repeat the process. Each iteration will 
increase the dominance of the first term in Eq. (9.29) so that the process converges to 

1 1 

Z = —ViXi = —Xi 
Ai Ai 

(at this stage v = xi, so that 14 = 1 , v 2 = V 3 = ■ ■ ■ = 0 ). 

The inverse power method also works with the nonstandard eigenvalue problem 

Ax = ABx (9.30) 

provided that Eq. (9.28) is replaced by 

Az = Bv (9.31) 

The alternative is, of course, to transform the problem to standard form before apply¬ 
ing the power method. 

Eigenvalue Shifting 

By inspection of Eq. (9.29) we see that the rate of convergence is determined by the 
strength of the inequality |Ai/A 2 | < 1 (the second term in the equation). If |A 2 | is well 
separated from | A 1 1, the inequality is strong and the convergence is rapid. On the other 
hand, close proximity of these two eigenvalues results in very slow convergence. 

The rate of convergence can be improved by a technique called eigenvalue 
shifting. If we let 

A = A* + s (9.32) 

where s is a predetermined “shift,” the eigenvalue problem in Eq. (9.27) is trans¬ 
formed to 


Ax = (A* + s)x 


or 


A*x = A*x 


where 


(9.33) 


A* = A — si (9.34) 

Solving the transformed problem in Eq. (9.33) by the inverse power method yields A* 
and xi, where A’, is the smallest eigenvalue of A*. The corresponding eigenvalue of the 
original problem, A = A* + s, is thus the eigenvalue closest to s. 

Eigenvalue shifting has two applications. An obvious one is the determination of 
the eigenvalue closest to a certain value .s. For example, if the working speed of a shaft 
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is s rpm, it is imperative to ensure that there are no natural frequencies (which are 
related to the eigenvalues) close to that speed. 

Eigenvalue shifting is also be used to speed up convergence. Suppose that we are 
computing the smallest eigenvalue Ai of the matrix A. The idea is to introduce a shift 
s that makes as small as possible. Since A* = Ai — s, we should choose s Ai 
(s = /.j should be avoided to prevent division by zero). Of course, this method works 
only if we have a prior estimate of Ap 

The inverse power method with eigenvalue shifting is a particularly powerful tool 
for finding eigenvectors if the eigenvalues are known. By shifting very close to an 
eigenvalue, the corresponding eigenvector can be computed in one or two iterations. 

Power Method 

The power method converges to the eigenvalue farthest from zero and the associated 
eigenvector. It is very similar to the inverse power method; the only difference be¬ 
tween the two methods is the interchange of v and z in Eq. (9.28). The outline of the 
procedure is: 

1. Let v be an approximation to x„ (a random vector of unit magnitude will do). 

2. Compute the vector 

z = Av (9.35) 

3. Compute |z|. 

4. Let v = z/|z| and repeat steps 2-4 until the change in v is negligible. 

At the conclusion of the procedure, |z| = ±A„ and v = x„ (the sign of A„ is deter¬ 
mined in the same way as in the inverse power method). 


■ invPower 

Given the matrix A and the scalar s, the function invPower returns the eigenvalue of A 
closest to s and the corresponding eigenvector. The matrix A* = A — si is decomposed 
as soon as it is formed, so that only the solution phase (forward and back substitu¬ 
tion) is needed in the iterative loop. If A is banded, the efficiency of the program can 
be improved by replacing LUdec and LUsol by functions that specialize in banded 
matrices—see Example 9.6. The program line that forms A* must also be modified to 
be compatible with the storage scheme used for A. 

function [eVal,eVec] = invPower(A,s,raaxlter,tol) 

% Inverse power mehod for finding the eigenvalue of A 
% closest to s & the correstponding eigenvector. 
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% USAGE: [eVal.eVec] = invPower(A,s,maxlter,tol) 

% maxlter = limit on number of iterations (default is 50). 
% tol = error tolerance (default is 1.0e-6). 


if nargin < 4; tol = 1.0e-6; end 
if nargin < 3; maxlter = 50; end 
n = size(A,1); 

A = A - eye(n)*s; % Form A* = A - si 

A = LUdec(A); % Decompose A* 

x = rand(n,l); % Seed eigenvecs. with random numbers 

xMag = sqrt(dot(x,x)); x = x/xMag; % Normalize x 

for i = 1:maxlter 

xOld = x; % Save current eigenvecs. 

x = LUsol(A,x); % Solve A*x = xOld 

xMag = sqrt(dot(x,x)); x = x/xMag; % Normalize x 

xSign = sign(dot(xOld,x)); % Detect sign change of x 

x = x*xSign; 

% Check for convergence 
if sqrt(dot(xOld - x,x01d - x)) < tol 
eVal = s + xSign/xMag; eVec = x; 
return 

end 

end 

error(’Too many iterations') 


EXAMPLE 9.4 

The stress matrix describing the state of stress at a point is 


-30 

10 

20 

10 

40 

-50 

20 

-50 

-10 


Determine the largest principal stress (the eigenvalue of S farthest from zero) by the 
power method. 


Solution First iteration: 

Let v = |"l 0 ol be the initial guess for the eigenvector. Then 


—30 10 20~ 


~r 


—30.0 _ 

10 40 -50 


0 

= 

10.0 

20 -50 -10 


0 


20.0 


z = Sv = 
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|z| = \/ 30 2 + 10 2 + 20 2 = 37.417 


’ —30.0’ 


"-0.801 77“ 

10.0 


0.267 26 


37.417 

20.0 


0.534 52 


Second iteration: 


—30 10 20“ 


’-0.80177’ 


37.416’ 

10 40 -50 


0.267 26 

= 

-24.053 

20 -50 -10 


0.534 52 


-34.744 


|z| = x/37.416 2 + 24.053 2 + 34.744 2 = 56.442 


37.416’ 

1 

0.662 91’ 

-24.053 


-0.42615 


56.442 


-34.744 


-0.615 57 



Third iteration: 


—30 10 2o’ 


0.66291’ 


’-36.460 ’ 

10 40 -50 


-0.42615 

= 

20.362 

1 

O 

i-H 

1 

o 

LO 

1 

o 

CN 

_1 


-0.61557 


40.721 


jz| = x/36.460 2 + 20.362 2 + 40.721 2 = 58.328 


’-36.460’ 

1 

’-0.62509’ 

20.362 


0.34909 

58.328 

40.721 


0.69814 



At this point the approximation of the eigenvalue we seek is k = —58.328 MPa (the 
negative sign is determined by the sign reversal of z between iterations). This is 
actually close to the second-largest eigenvalue X 2 = —58.39 MPa! By continuing the 
iterative process we would eventually end up with the largest eigenvalue a 3 = 70.94 
MPa. But since \X 2 \ and |/, 3 | are rather close, the convergence is too slow from this 
point on for manual labor. Here is a program that does the calculations for us: 


% Example 9.4 (Power method) 

S = [-30 10 20; 10 40 -50; 20 -50 -10]; 
v = [1; 0; 0] ; 
for i = 1:100 

vOld = v; z = S*v; zMag = sqrt(dot(z,z)); 
v = z/zMag; vSign = sign(dot(vOld,v)); 
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v = v*vSign; 

if sqrt(dot(vOld - v,v01d - v)) < 1.0e-6 
eVal = vSign*zMag 
numlter = i 
return 

end 

end 

error(’Too many iterations') 


The results are: 


» eVal = 

70.9435 
numlter = 

93 

Note that it took 93 iterations to reach convergence. 


EXAMPLE 9.5 

Determine the smallest eigenvalue ki and the corresponding eigenvector of 


A = 


'll 2 3 1 

2 9 3 5 

3 3 15 4 

1 5 4 12 

4 2 3 4 


4' 

2 

3 

4 
17 


Use the inverse power method with eigenvalue shifting knowing that 7_i ~ 5. 


Solution 

% Example 9.5 (Inverse power method) 
s = 5; 

A = [11 2314; 

29352 ; 

3 3 15 4 3; 

1 5 4 12 4; 

4 2 3 417]; 

[eVal,eVec] = invPower(A,s) 

Here is the output: 


» eVal = 
4.8739 
eVec = 


0.2673 
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-0.7414 
-0.0502 
0.5949 
-0.1497 

Convergence was achieved with 4 iterations. Without the eigenvalue shift 26 iter¬ 
ations would be required. 

EXAMPLE 9.6 

Unlike Jacobi diagonalization, the inverse power method lends itself to eigenvalue 
problems of banded matrices. Write a program that computes the smallest buckling 
load of the beam described in Example 9.3, making full use of the banded forms. Run 
the program with 100 interior nodes (n = 100). 

Solution The function invPower 5 listed below returns the smallest eigenvalue and 
the corresponding eigenvector of Ax = ABx, where A is a pentadiagonal matrix and 
B is a sparse matrix (in this problem it is tridiagonal). The matrix A is input by its 
diagonals d, e and f as was done in Art. 2.4 in conjunction with the LU decomposition. 
The algorithm for invPower 5 does not use B directly, but calls the function func (v) 
that supplies the product Bv. Eigenvalue shifting is not used. 

function [eVal,eVec] = invPower5(func,d,e,f) 

% Finds smallest eigenvalue of A*x = lambda*B*x by 
% the inverse power method. 

% USAGE: [eVal,eVec] = invPower5(func,d,e,f) 

% Matrix A must be pentadiagonal and stored in form 
% A = [f\e\d\e\f]. 

% func = handle of function that returns B*v. 


n = length(d); 

[d,e,f] = LUdec5(d,e,f); % Decompose A 

x = rand(n,l); % Seed x with random numbers 

xMag = sqrt(dot(x,x)); x = x/xMag; % Normalize x 
for i = 1:50 
xOld = x; 

x = LUsol5(d,e,f,feval(func,x)); 
xMag = sqrt(dot(x,x)); x = x/xMag 
xSign = sign(dot(xOld,x)); 
x = x*xSign; 

% Check for convergence 
if sqrt(dot(xOld - x,x01d - x)) < 1.Oe 


% Save current x 
% Solve [A]{x> = [B]{x01d> 
% Normalize x 
% Detect sign change of x 
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eVal = xSign/xMag; eVec = x; 
return 

end 

end 

error(’Too many iterations') 

The function that computes Bv is 
function Bv = fex9_6(v) 

% Computes the product B*v in Example 9.6. 

n = length(v); 

Bv = zeros(n,1); 
for i = 2:n-l 

Bv(i) = -v(i-l) + 2*v(i) - v(i+l); 

end 

Bv(l) = 2*v(1) - v(2); 

Bv(n) = -v(n-l) + 2*v(n); 

Here is the program that calls invPowerS: 

% Example 9.6 (Inverse power method for pentadiagonal A) 
n = 100; 

d = ones(n,1)*6; 
d(l) = 5; d(n) = 7; 
e = ones(n-1,1)*(-4); 
f = ones(n-2,1); 

[eVal,eVec] = invPower5(@fex9_6,d,e,f); 

fprintf(’PL"2/EI =’) 

fprintf( '%9 .4f’,eVal*(n+1)"2) 

The output, shown below, is in excellent agreement with the analytical value. 
» PL"2/EI = 20.1867 

PROBLEM SET 9.1 


~7 

3 

1 


'4 

0 

o' 

3 

9 

6 

B = 

0 

9 

0 

1 

6 

8 


0 

0 

4 


1. Given 
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convert the eigenvalue problem Ax = / Bx to the standard form Hz = Az. What is 
the relationship between x and z? 

2. Convert the eigenvalue problem Ax = kBx, where 


4 

-1 

O ' 


2 

-1 

o ' 

-1 

4 

-1 

B = 

-1 

2 

-1 

0 

-1 

4 


0 

-1 

1 


to the standard form. 

3. An eigenvalue of the problem in Prob. 2 is roughly 2.5. Use the inverse power 

method with eigenvalue shifting to compute this eigenvalue to four decimal 
places. Start with x = |^1 0 oj . Hint: two iterations should be sufficient. 

4. The stress matrix at a point is 


150 


S = 


-60 


0 


-60 

120 

0 


0 

0 

80 


MPa 


Compute the principal stresses (eigenvalues of S). 

5. 



2m 


The two pendulums are connected by a spring which is undeformed when the 
pendulums are vertical. The equations of motion of the system can be shown 
to be 


kL(9 2 — 01 ) — mgd i = mL'Oi 
—kL{92 — 0i) — 2mgd 2 = 2 mL9 2 

where 9 1 and 9 2 are the angular displacements and k is the spring stiffness. 
Determine the circular frequencies of vibration and the relative amplitudes 
of the angular displacements. Use m= 0.25 kg, k= 20 N/m, L = 0.75 m and 
g = 9.80665 m/s 2 . 
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6 . 


L L 



Kirchoff’s laws for the electric circuit are 

3 /1 — i*2 — ^*3 — —L C 

— i\ — —LC 

— i i + = — LC 


d 2 ii 

di 2 

df 2 

rf 2 t3 

dt 2 " 


Compute the circular frequencies of the circuit and the relative amplitudes of the 
loop currents. 

7. Compute the matrix A* that results from annihilation of Au and An in the matrix 


4-1 0 1 

-1 6-20 
0-2 3 2 

10 2 4 


by a Jacobi rotation. 

8 . ■ Use the Jacobi method to determine the eigenvalues and eigenvectors of 


A = 


4 

-1 

-2 


-1 

3 

3 


-2 

3 

1 


9. ■ Find the eigenvalues and eigenvectors of 


4 

-2 

1 

-1 

-2 

4 

-2 

1 

1 

-2 

4 

-2 

-1 

1 

-2 

4 


with the Jacobi method. 
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10. ■ Use the power method to compute the largest eigenvalue and the corresponding 
eigenvector of the matrix A given in Prob. 9. 

11. ■ Find the smallest eigenvalue and the corresponding eigenvector of the matrix 
A in Prob. 9. Use the inverse power method. 



"1.4 

0.8 

0.4" 


0.4 

-0.1 

0 . 0 ' 

A = 

0.8 

6.6 

0.8 

B = 

-0.1 

0.4 

-0.1 


0.4 

0.8 

5.0 


0.0 

-0.1 

0.4 


Find the eigenvalues and eigenvectors of Ax = /.Bx by the Jacobi method. 

13. ■ Use the inverse power method to compute the smallest eigenvalue in Prob. 12. 

14. ■ Use the Jacobi method to compute the eigenvalues and eigenvectors of the 
matrix 


11 2 3 1 4 2 

2 9 3 5 2 1 

3 3 15 4 3 2 

1 5 4 12 4 3 

4 2 3 4 17 5 

2 1 2 3 5 8 


15. ■ Find the eigenvalues of Ax = ABx by the Jacobi method, where 



6 

-4 

1 

0 " 


1 

-2 

3 

-1" 


-4 

6 

-4 

1 

B = 

-2 

6 

-2 

3 

A = 

1 

-4 

6 

-4 

3 

-2 

6 

-2 



0 

1 

-4 

7_ 


-1 

3 

-2 

9_ 


Warning : B is not positive definite. 

16. ■ 



n 


x 


The figure shows a cantilever beam with a superimposed finite difference mesh. If 
u[x, t) is the lateral displacement of the beam, the differential equation of motion 
governing bending vibrations is 


where y is the mass per unit length and El is the bending rigidity. The bound¬ 
ary conditions are u{ 0, t ) = u'[ 0, t) = u"(L, t) = u"'{L , t) = 0. With u(x, t) = 
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y[x) sin cot the problem becomes 


y 


( 4 ) = 

EI 


y 


y(0) = y'( 0) = y"{L) = y'"[L ) = o 


The corresponding finite difference equations are 


7-4100 
-4 6-4 1 0 

1-4 6-4 1 

0 ••• 1-4 6 

0 0 1-4 

0 0 0 1 


0 


yi 


yi 

0 


T2 


y2 

0 


T3 


ya 




= X 


-4 1 


y«-2 


yn- 2 

5 -2 


y«-i 


y«-i 

-2 1. 


. yn . 


_ yn/2 _ 


where 


A = 


2 

ary 

~EI 


4 


(a) Write down the matrix H of the standard form Hz = Az and the transformation 
matrix P as in y = Pz. (b) Write a program that computes the lowest two circular 
frequencies of the beam and the corresponding mode shapes (eigenvectors) using 
the Jacobi method. Run the program with n= 10. Note: the analytical solution for 
the lowest circular frequency is = (3.515/L 2 ) jEI/y. 

17. ■ 



L/2 


2 El 0 

(a) 



L/4 LI 4 
0123456789 10 


(b) 


The simply supported column in Fig. (a) consists of three segments with the 
bending rigidities shown. If only the first buckling mode is of interest, it is 
sufficient to model half of the beam as shown in Fig. (b). The differential equation 
for the lateral displacement u[x) is 
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with the boundary conditions u(0) = u'(L/2) = 0. The corresponding finite dif¬ 
ference equations are 


2 

-1 

0 

0 

0 

0 

0 


O' 


Ui 


U\ 

-1 

2 

-1 

0 

0 

0 

0 


0 


u 2 


u 2 

0 

-1 

2 

-1 

0 

0 

0 


0 


u 3 


u 3 

0 

0 

-1 

2 

-1 

0 

0 


0 


u 4 


u 4 

0 

0 

0 

-1 

2 

-1 

0 


0 


u 5 

= k 

U 5 / 1.5 

0 

0 

0 

0 

-1 

2 

-1 


0 


u 6 


u 6 /2 

0 


0 

0 

0 

0 

-1 

2 

-1 


u a 


u a /2 

0 


0 

0 

0 

0 

0 

-1 

1 


_ u w _ 


U\ o/4 _ 


where 

El o ' 20 J 

Write a program that computes the lowest buckling load P of the column with 
the inverse power method. Utilize the banded forms of the matrices. 



The springs supporting the three-bar linkage are undeformed when the linkage 
is horizontal. The equilibrium equations of the linkage in the presence of the 
horizontal force P can be shown to be 



where k is the spring stiffness. Determine the smallest buckling load P and the 
corresponding mode shape. Hint: the equations can easily rewritten in the stan¬ 
dard form AO = kO, where A is symmetric. 

























358 


Symmetric Matrix Eigenvalue Problems 


The differential equations of motion for the mass-spring system are 

k{—2ui + u 2 ) = miii 
k{u\ — 2 u 2 + u 3 ) = 3 mu 2 
k[u 2 — 2 u 3 ) = 2 mils 

where m,(?) is the displacement of mass i from its equilibrium position and lc 
is the spring stiffness. Determine the circular frequencies of vibration and the 
corresponding mode shapes. 

20. ■ 


L L L L 


-. 00000 /- 

-. 00000 /- 

-. 00000 /- 

-. 00000 /- 

.(4 . - 


.4 . - 

.4 .. 

c y 

c /2 y 

c/3 y 

C/4 


Kirchoff’s equations for the circuit are 

d 2 i\ 1 2 

L ^ + c h + c [h ~ l2) = 0 

d 2 i 2 2 . . 3 . 

L , + — (*2 — h) + — * 3 ) = 0 

dt z C C 

d 2 is 3 . 4 . 

L . . + -p, [is — h) + — 14 ) = 0 

dt z C C 

d 2 U 4 5 

L ~dfi + c {u ~ lz)+ c u = 0 

Find the circular frequencies of the currents. 

21 . ■ 



Determine the circular frequencies of oscillation for the circuit shown, given the 
Kirchoff equations 


d 2 i\ / d 2 i 1 d 2 i 2 \ 1 . 

~diz +L \~di 2 ~ ~di 2 ) + C h = 

/ d 2 i 2 d 2 i\ \ / d 2 i 2 d 2 i 3 \ 2. 

[lit 2 ~ lit 2 ) + L [lit 2 ~ lit 2 ) + C h = 
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/ d 2 i 3 d 2 i2 \ f d 2 i 3 d 2 U \ 3. 

\~di 2 ~~di 2 ) + L \~dt 2 ~~di 2 ) + C l3 

/ d 2 U d 2 i 3 \ d 2 i 4 4. 
[lit 2 ~ ~dt 2 ) + L ~dt 2 + C h 


22. ■ Several iterative methods exist for finding the eigenvalues of a matrix A. One of 
these is the LR method, which requires the matrix to be symmetric and positive 
definite. Its algorithm very simple: 


Let A 0 = A 

do with i = 0, 1, 2 ,... 

Use Choleski’s decomposition A,= L;L/ to compute L , 

Form A i+ i = L/L, 
end do 

It can be shown that the diagonal elements of A )+ i converge to the eigenvalues of 
A. Write a program that implements the LR method and test it with 


A = 


4 

3 

1 


3 

4 
2 


1 

2 

3 


9.4 Householder Reduction to Tridiagonal Form 

It was mentioned before that similarity transformations can be used to transform an 
eigenvalue problem to a form that is easier to solve. The most desirable of the “easy" 
forms is, of course, the diagonal form that results from the Jacobi method. However, 
the facobi method requires about 10/r 3 to 2On 3 multiplications, so that the amount of 
computation increases very rapidly with n. We are generally better off by reducing the 
matrix to the tridiagonal form, which can be done in precisely n—2 transformations 
by the Householder method. Once the tridiagonal form is achieved, we still have to 
extract the eigenvalues and the eigenvectors, but there are effective means of dealing 
with that, as we see in the next article. 

Householder Matrix 

Householder’s transformation utilizes the Householder matrix 


Q = I 


H 


(9.36) 
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where u is a vector and 


„ 1 T 1 
H = -u r u = - |u| 
2 2 


(9.37) 


Note that uu r in Eq. (9.36) is the outer product; that is, a matrix with the elements 
(uu r ) ; . = UiUj. Since Q is obviously symmetric (Q r = Q), we can write 


uu 


H 


Q Q = QQ = I —tv I — 77 - = 1 - 2 ~rr + 


uu 


H 


uu r u (u r u) u 1 


H 


H 2 


uu r u (2 H) u r 

= 1-2-1--—^ = I 

H H 2 

which shows that Q is also orthogonal. 

Now let x be an arbitrary vector and consider the transformation Qx. Choosing 


u = x + fce i 


(9.38) 


where 


we get 


fc=±|x| ei = [l 0 0 o] 
u (x+ fce i) r 


Qx = I - 


uu 1 

H 


H 


u(x r x+fcefx) u(fc 2 + fcxi) 

= x --- -=X — 


H 


H 


But 


2 H = (x+ fcei) r (x+ fcei) = |x] 2 + fc(x r ei + e^x) + k pt2 e\e i 
= fc 2 + 2fcxi + fc 2 = 2 (fc 2 + fcxi) 


so that 


Qx = x-u =-fcei = |^-fc 0 0 ••• oj 
Hence the transformation eliminates all elements of x except the first one. 


(9.39) 


Householder Reduction of a Symmetric Matrix 


Let us now apply the following transformation to a symmetric nx n matrix A: 


1 o r " 


x r ' 


An x r 

1 

O 

O 

i _ 

X 

A' 


Qx QA' 


(9.40) 


Here x is represents the first column of A with the first element omitted, and A' 
is simply A with its first row and column removed. The matrix Q of dimensions 
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[n— 1) x [n — 1) is constructed using Eqs. (9.36)—(9.38). Referring to Eq. (9.39), we 
see that the transformation reduces the first column of A to 


An 

Qx 


An 

-k 

0 


The transformation 


A 


PiAP, 


An (Qx) r 

Qx QA'Q 


(9.41) 


thus tridiagonalizes the first row as well as the first column of A. Here is a diagram of 
the transformation for a 4 x 4 matrix: 


1 

0 0 0 

o o o 

Q 



The second row and column of A are reduced next by applying the transformation to 
the 3x3 lower right portion of the matrix. This transformation can be expressed as 
A <- P 2 AP 2) where now 


P 2 


I 2 0 r 

o Q 


(9.42) 


In Eq. (9.42) I 2 is a 2 x 2 identity matrix and Q is a {n — 2) x [n— 2) matrix constructed 
by choosing for x the bottom n — 2 elements of the second column of A. It takes a total 
of n — 2 transformations with 


Pi = 


I; 

0 


o r 

Q 


i = 1,2,2 


to attain the tridiagonal form. 
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It is wasteful to form P, and then carry out the matrix multiplication P,AP,. We 
note that 


where 


A'Q = A' 



, A'u T 
= A'-u r 

H 


A'-vu 


T 



(9.43) 


Therefore, 


where 


uu J 


= A'-vu r - 


uu J 


QA'Q = (I - — ) (A'-vu 7 ) = A'—vu r - (A'-vu r ) 


H 


u (u r A') u (u r v) u 1 


H 


+ 


H 


= A'—vu r —uv r + 2guu 7 


Letting 


& = 


2 H 


w = v— gu 

it can be easily verified that the transformation can be written as 

QA'Q = A'-wu r -uw J 


(9.44) 

(9.45) 

(9.46) 


which gives us the following computational procedure which is to be carried out with 
i = 1,2,..., n — 2: 

1. Let A' be the ( n— i ) x (n - i) lower right-hand portion of A. 

2. Let x = Aj +2 ,i ■■■ A„,J (the column of length n — i just to the left of 

A'). 

3. Compute |x|. Let k = |x| if Xi >0 and k = — |x| if X\ <0 (this choice of sign mini¬ 
mizes the roundoff error). 

4. Let u = l^/c+Xi x 2 x 3 ■ • • x„_jJ . 

5. Compute H = |u| 2 /2. 

6. Compute v = A'u /H. 

7. Compute g = u T v/(2H). 

8. Compute w = v — gu. 

9. Compute the transformation A -e- A'— w r u — u r w. 

10. Set Ait .|_i = Aj ,]j = k. 
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Accumulated Transformation Matrix 

Since we used similarity transformations, the eigenvalues of the tridiagonal matrix 
are the same as those of the original matrix. However, to determine the eigenvectors 
X of original A we must use the transformation 

X = PXtndiag 

where P is the accumulation of the individual transformations: 


P = PiP 2 - • P„_ 2 


We build up the accumulated transformation matrix by initializing P to a n x n 
identity matrix and then applying the transformation 


Pll P12 


1 

O 

_ 1 


1 

Oj 

P21 P22 


1 

O 

O 

1 _ 


P12 P22Q 


(b) 


with i = 1,2,..., n — 2. It can be seen that each multiplication affects only the right¬ 
most n — i columns of P (since the first row of Pi 2 contains only zeroes, it can also be 
omitted in the multiplication). Using the notation 


P'= 


Pl 2 

?22 


we have 


where 


P12Q 

P22Q 


P'Q = P' 




P'-yu r 


y = 


P'u 


The procedure for carrying out the matrix multiplication in Eq. (b) is 


(9.47) 


(9.48) 


• Retrieve u (in our triangularization procedure the u’s are stored in the columns 
of the lower triangular portion of A). 

• Compute H = |u| 2 /2. 

• Compute y = P'u /H. 

• Compute the transformation P' <- P'— yu r . 


■ householder 

This function performs the Householder reduction on the matrix A. Upon return, 
d occupies the principal diagonal of A and c forms the upper subdiagonal; that is, 
d = diag(A)andc = diag(A, l). The portion of A below the principal diagonal is 
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utilized to store the vectors u that are needed in the computation of the transformation 
matrix P. 

function A = householder(A) 

% Housholder reduction of A to tridiagonal form A = [c\d\c]. 

% Extract c and d by d = diag(A), c = diag(A,l). 

% USAGE: A = householder(A) 

n = size(A,1); 
for k = l:n-2 

u = A(k+1:n,k); 

uMag = sqrt(dot(u,u)); 

if u(l) < 0; uMag = -uMag; end 

u(l) = u(l) + uMag; 

A(k+l:n,k) = u; % Save u in lower part of A. 

H = dot(u,u)/2; 
v = A(k+1:n,k+1:n)*u/H; 
g = dot(u,v)/(2*H); 
v = v - g*u; 

A(k+1:n,k+1:n) = A(k+1:n,k+1:n) - v*u’ - u*v’; 

A(k,k+1) = -uMag; 

end 

■ householderP 

The function householderP returns the accumulated transformation matrixP. There 
is no need to call it if only the eigenvalues are to be computed. Note that the input 
parameter A is not the original matrix, but the matrix returned by householder. 

function P = householderP(A) 

% Computes transformation matrix P after 
% householder reduction of A is carried out. 

% USAGE: P = householderP(A). 

n = size(A,1); 

P = eye(n); 
for k = l:n-2 

u = A(k+1:n,k); 

H = dot(u,u)/2; 
v = P(1:n,k+1:n)*u/H; 
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P(1:n,k+1:n) = P(l:n,k+l:n) - v*u’; 

end 

EXAMPLE 9.7 

Transform the matrix 

'72 3-1 

2 8 5 1 

A = 

3 5 12 9 

-119 7 


into tridiagonal form using Householder reduction. 
Solution Reduce the first row and column: 



"8 

5 

f 


2 " 

A' = 

5 

12 

9 

X = 

3 


1 

9 

7 


-1 


k = |x| = 3.7417 


k+ x i 


’ 5.7417’ 

x 2 

= 

3 

*3 


-1 



21.484 


uu 


T 


32.967 17 225 -5.7417 
17.225 9 -3 

-5.7417 -3 1 


Q 



-0.53450 -0.80176 0.26725 

-0.80176 0.58108 0.13964 

0.26725 0.13964 0.95345 


QA'Q = 


10.642 

-0.1388 

-9.1294 


-0.1388 

5.9087 

4.8429 


-9.1294 

4.8429 

10.4480 


An (Qx) r 

Qx QA'Q 


7 

-3.7417 

0 

0 


-3.7417 

10.642 

-0.1388 

-9.1294 


0 

-0.1388 

5.9087 

4.8429 


0 

-9.1294 

4.8429 

10.4480 


In the last step we used the formula Qx = — k 0 
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Reduce the second row and column: 


A' = 

5.9087 

4.8429 

X = 

-0.1388 


4.8429 

10.4480 


-9.1294 


k=- |x| = -9.1305 


where the negative sign on A; was determined by the sign of X\. 


k+ xi 


’-9.2693’ 

x 2 


-9.1294 



T 

’85.920 


uu 


84.623 


H = - |u| 2 = 84.633 
2 

84.623’ 

83.346 


Q 



0.01521 -0.99988 

-0.99988 0.01521 


QA'Q 


10.594 4.772 
4.772 5.762 


A 


An 

A21 


0 


A12 

A 22 

Qx 


0 r ’ 

(Qx) r 

QA'Q 


7 -3.742 
3.742 10.642 

0 9.131 

0 0 


0 0 

9.131 0 

10.594 4.772 
4.772 5.762 


EXAMPLE 9.8 

Use the function householder to tridiagonalize the matrix in Example 9.7; also de¬ 
termine the transformation matrix P. 

Solution 

% Example 9.8 (Householder reduction) 

A = [7 2 3-1; 

2 8 5 1; 

3 5 12 9; 

-1 1 9 7] ; 

A = householder(A) ; 
d = diag(A)’ 
c = diag(A,1)’ 

P = householderP(A) 

The results of running the above program are: 

» d = 

7.0000 10.6429 10.5942 5.7629 
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3.7417 

9.1309 

4.7716 


1.0000 

0 

0 

0 

0 

- 0.5345 

- 0.2551 

0.8057 

0 

- 0.8018 

- 0.1484 

- 0.5789 

0 

0.2673 

- 0.9555 

- 0.1252 


9.5 Eigenvalues of Symmetric Tridiagonal Matrices 

Sturm Sequence 

In principle, the eigenvalues of a matrix A can be determined by finding the roots of 
the characteristic equation |A — 7.1j = 0. This method is impractical for large matrices 
since the evaluation of the determinant involves n 3 /3 multiplications. However, if the 
matrix is tridiagonal (we also assume it to be symmetric), its characteristic polynomial 


P n M = IA-A.II 


d\ — X C\ 

c i d .2 — X 

0 c 2 

0 0 


0 0 

Cz 0 

— X C 3 

C 3 d 4 — X 


0 

0 

0 

0 


0 0 


0 Cn —1 dfi X 


can be computed with only 3 (n — 1) multiplications using the following sequence of 
operations: 


PoM = 1 

P 1 (X) = d 1 -X (9.49) 

Pi (A) = ( d t - X) Pi -1 (A) - c?_! Pi -2 (A), i = 2,3,..., n 

The polynomials /-}>(/.), P\ [X],, P„[X) form a Sturm sequence that has the fol¬ 
lowing property: 

• The number of sign changes in the sequence P 0 ( a ). P t (.a),, P n {a) is equal to the 
number of roots of P n [X ) that are smaller than a. If a member Pi (a) of the sequence 
is zero, its sign is to be taken opposite to that of 1 [a). 

As we see shortly, Sturm sequence property makes it relatively easy to bracket the 
eigenvalues of a tridiagonal matrix. 
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■ sturmSeq 

Given the diagonals c and d of A = [c\d\c], and the value of X, this function returns 
the Sturm sequence Pq{X), P\(X), ..., P n [X). Note that P„(.X ) = |A — A.I|. 

function p = sturmSeq(c,d,lambda) 

% Returns Sturm sequence p associated with 
% the tridiagonal matrix A = [c\d\c] and lambda. 

% USAGE: p = sturmSeq(c,d,lambda). 

% Note that IA - lambda*I| = p(n). 

n = length(d) + 1; 
p = ones(n,1); 
p(2) = d(l) - lambda; 
for i = 2:n-l 

p(i+l) = (d(i) - lambda)*p(i) - (c(i-l)"2 )*p(i-l); 

end 


■ count_eVals 

This function counts the number of sign changes in the Sturm sequence and returns 
the number of eigenvalues of the matrix A = [c\d\c] that are smaller than X. 

function num_eVals = count,eVals(c,d,lambda) 

% Counts eigenvalues smaller than lambda of matrix 
% A = [c\d\c]. Uses the Sturm sequence. 

% USAGE: num_eVals = count,eVals(c,d,lambda). 

p = sturmSeq(c,d,lambda) ; 
n = length(p); 
oldSign = 1; num_eVals = 0; 
for i = 2:n 

pSign = sign(p(i)); 

if pSign == 0; pSign = -oldSign; end 
if pSign*oldSign < 0 

num_eVals = num_eVals + 1; 

end 

oldSign = pSign; 


end 
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EXAMPLE 9.9 

Use the Sturm sequence property to show that the smallest eigenvalue of A is in the 
interval (0.25, 0.5), where 

' 2-1 0 0 " 

-1 2-1 0 

A = 

0-1 2-1 
0 0-12 


Solution Taking A = 0.5, we have dj — X = 1.5 and c \_, = 1 and the Sturm sequence 
in Eqs. (9.49) becomes 


Po(0.5) = 1 
Pi (0.5) = 1.5 

P 2 (0.5) = 1.5(1.5) - 1 = 1.25 
P 3 (0.5) = 1.5(1.25) - 1.5 = 0.375 
P 4 (0.5) = 1.5(0.375) - 1.25 = -0.6875 

Since the sequence contains one sign change, there exists one eigenvalue smaller 
than 0.5. 

Repeating the process with X = 0.25 {d i —X= 1.75, cf_ x = 1), we get 
Pq (0.25) = 1 
Pi (0.25) = 1.75 

P 2 (0.25) = 1.75(1.75) - 1 = 2.0625 
P 3 (0.25) = 1.75(2.0625) - 1.75 = 1.8594 
P,(0.25) = 1.75(1.8594) - 2.0625 = 1.1915 

There are no sign changes in the sequence, so that all the eigenvalues are greater than 
0.25. We thus conclude that 0.25 < Ai < 0.5. 

Gerschgorin's Theorem 

Gerschgorin’s theorem is useful in determining the global bounds on the eigenvalues 
of an n x n matrix A. The term “global” means the bounds that enclose all the eigen¬ 
values. We give here a simplified version of the theorem for a symmetric matrix. 

• If A is an eigenvalue of A, then 

a-i — H < A < at + r,-, i = 1,2,..., n 
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where 


n 

CLi = An n = Y] I A; I (9-50) 

i= i 
rV* 


It follows that the global bounds on the eigenvalues are 


Vin > minta - n) A max < maxta + n) (9.51) 

l l 


■ gerschgorin 

The function gerschgorin returns the lower and the upper global bounds on the 
eigenvalues of a symmetric tridiagonal matrix A = [c\d\c]. 

function [eValMin,eValMax]= gerschgorin(c,d) 

% Evaluates the global bounds on eigenvalues 
% of A = [c\d\c]. 

% USAGE: [eValMin,eValMax]= gerschgorin(c,d). 

n = length(d); 
eValMin = d(l) - abs(c(l)); 
eValMax = d(l) + abs(c(l)); 
for i = 2:n-l 

eVal = d(i) - abs(c(i)) - abs(c(i-l)); 
if eVal < eValMin; eValMin = eVal; end 
eVal = d(i) + abs(c(i)) + abs(c(i-l)); 
if eVal > eValMax; eValMax = eVal; end 

end 

eVal = d(n) - abs(c(n-l)); 

if eVal < eValMin; eValMin = eVal; end 

eVal = d(n) + abs(c(n-l)); 

if eVal > eValMax; eValMax = eVal; end 


EXAMPLE 9.10 

Use Gerschgorin’s theorem to determine the global bounds on the eigenvalues of the 
matrix 


A = 


4 

-2 

0 


-2 

4 

-2 


0 

-2 

5 







371 


9.5 Eigenvalues of Symmetric Tridiagonal Matrices 


Solution Referring to Eqs. (9.50), we get 

a\ = 4 a 2 = 4 a 3 = 5 
T\=2 r 2 = 4 r 3 = 2 

Hence 

*min > min(a,- -n) = 4-4 = 0 
^•max < max(fli + r,-) = 4 + 4 = 8 

Bracketing Eigenvalues 

The Sturm sequence property together with Gerschgorin’s theorem provides us con¬ 
venient tools for bracketing each eigenvalue of a symmetric tridiagonal matrix. 


■ eValBrackets 

The function eValBrackets brackets the m smallest eigenvalues of a symmetric 
tridiagonal matrix A = [c\d\c]. It returns the sequence r\,r 2 ,, r nVr \, where each 
interval (q, r,-+ 1 ) contains exactly one eigenvalue. The algorithm first finds the global 
bounds on the eigenvalues by Gerschgorin’s theorem. The method of bisection in 
conjunction with the Sturm sequence property is then used to determine the upper 
bounds on k m , A„_i,..., ki in that order. 

function r = eValBrackets(c,d,m) 

% Brackets each of the m lowest eigenvalues of A = [c\d\c] 

% so that there is one eivenvalue in [r(i), r(i+l)]. 

% USAGE: r = eValBrackets(c,d,m). 

[eValMin,eValMax]= gerschgorinfc,d); % Find global limits 

r = ones(m+l,l); r(l) = eValMin; 

% Search for eigenvalues in descending order 
for k = m:-1:1 

% First bisection of interval (eValMin,eValMax) 
eVal = (eValMax + eValMin)/2; 
h = (eValMax - eValMin)/2; 
for i = 1:100 

% Find number of eigenvalues less than eVal 
num_eVals = count,eVals(c,d,eVal); 

% Bisect again & find the half containing eVal 
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h = h/2; 

if num_eVals < k ; eVal = eVal + h; 
elseif num_eVals > k ; eVal = eVal - h; 
else; break 
end 

end 

% If eigenvalue located, change upper limit of 
% search and record result in {r} 

ValMax = eVal; r(k+l) = eVal; 

end 


EXAMPLE 9.11 

Bracket each eigenvalue of the matrix in Example 9.10. 

Solution In Example 9.10 we found that all the eigenvalues lie in (0, 8). We now bisect 
this interval and use the Sturm sequence to determine the number of eigenvalues in 
(0, 4). With A = 4, the sequence is—see Eqs. (9.49) 


ft (4) = 1 

Pi( 4) = 4-4 = 0 

P 2 (4) = (4-4)(0)-2 2 (l) = -4 


P 3 (4) = (5-4)(-4)-2 2 (0) = -4 


Since a zero value is assigned the sign opposite to that of the preceding member, the 
signs in this sequence are The one sign change shows the presence of 

one eigenvalue in (0, 4). 

Next we bisect the interval (4, 8) and compute the Sturm sequence with A = 6: 


ft (6) = 1 

Pi(6) =4-6 =-2 

ft (6) = (4 — 6) (—2) — 2 2 (1) = 0 

ft (6) = (5 — 6)(0) — 2 2 (—2) = 8 

In this sequence the signs are (+,—,+,+), indicating two eigenvalues in (0, 6). 
Therefore 


4 A 2 <6 6 < A 3 < 8 


0 < Ai < 4 
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Computation of Eigenvalues 

Once the desired eigenvalues are bracketed, they can be found by determining the 
roots of P„(>.) = 0 with bisection or Brent’s method. 


■ eigenvals3 

The function eigenvals3 computes the m smallest eigenvalues of a symmetric 
tridiagonal matrix with the method of Brent. 


function eVals = eigenvals3(C,D,m) 

% Computes the smallest m eigenvalues of A = [C\D\C]. 
% USAGE: eVals = eigenvals3(C,D,m). 

% C and D must be delared ’global' in calling program. 
eVals = zeros(m,l); 

r = eValBrackets(C,D,m); % Bracket eigenvalues 
for i=l:m 

% Solve IA - eVal*I| for eVal by Brent’s method 
eVals(i) = brent(@func,r(i),r(i+l)); 

end 

function f = func(eVal); 

% Returns IA - eVal*I| (last element of Sturm seq.) 
global C D 

p = sturmSeq(C,D,eVal); 
f = p(length(p)); 


EXAMPLE 9.12 

Determine the three smallest eigenvalues of the 100 x 100 matrix 


A = 


2 

-1 

0 


-1 0 0 " 

2 -1 ■■■ 0 

-1 2 ••• 0 

0 ■■■ -1 2 


0 
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Solution 

% Example 9.12 (Eigenvals. of tridiagonal matrix) 
format short e 
global C D 
m = 3; n = 100; 

D = ones(n,1)* 2; 

C = -ones(n-1,1); 

eigenvalues = eigenvals3(C,D,m)’ 

The result is 

» eigenvalues = 

9.6 744e-004 3.8688e-003 8.7013e-003 

Computation of Eigenvectors 

If the eigenvalues are known (approximate values will be good enough), the best 
means of computing the corresponding eigenvectors is the inverse power method 
with eigenvalue shifting. This method was discussed before, but the algorithm did 
not take advantage of banding. Here we present a version of the method written for 
symmetric tridiagonal matrices. 

■ invPower3 

This function is very similar to invPower listed in Art. 9.3, but executes much faster 
since it exploits the tridiagonal structure of the matrix. 

function [eVal.eVec] = invPower3(c,d,s,maxlter,tol) 

% Computes the eigenvalue of A =[c\d\c] closest to s and 
% the associated eigenvector by the inverse power method. 

% USAGE: [eVal.eVec] = invPower3(c,d,s,maxlter,tol). 

% maxlter = limit on number of iterations (default is 50). 

% tol = error tolerance (default is 1.0e-6). 

if nargin < 5; tol = 1.0e-6; end 
if nargin < 4; maxlter = 50; end 
n = length(d); 

e=c;d=d-s; % Apply shift to diag. terms of A 

[c,d,e] = LUdec3(c,d,e); % Decompose A* = A - si 

x = rand(n,l); % Seed x with random numbers 

xMag = sqrt(dot(x,x)); x = x/xMag; % Normalize x 







9.5 Eigenvalues of Symmetric Tridiagonal Matrices 


for i = limaxlter 
xOld = x; 

x = LUsol3(c,d,e,x); 


% Save current x 


% Solve A*x = xOld 


xMag = sqrt(dot(x,x)); x = x/xMag; % Normalize x 
xSign = sign(dot(xOld,x)); % Detect sign change of x 


x = x*xSign; 

% Check for convergence 
if sqrt(dot(xOld - x,x01d - x)) < tol 
eVal = s + xSign/xMag; eVec = x; 
return 

end 

end 

error('Too many iterations’) 

EXAMPLE 9.13 

Compute the 10th smallest eigenvalue of the matrix A given in Example 9.12. 

Solution The following program extracts the m th eigenvalue of A by the inverse 
power method with eigenvalue shifting: 

Example 9.13 (Eigenvals. of tridiagonal matrix) 
format short e 
m = 10 
n = 100; 

d = ones(n,l)*2; c = -ones(n-1,1); 
r = eValBrackets(c,d,m) ; 
s =(r(m) + r(m+l))/2; 

[eVal.eVec] = invPower3(c,d,s); 
mth_eigenvalue = eVal 

The result is 


» m = 


10 

mth_eigenvalue = 
9.5974e-002 


EXAMPLE 9.14 

Compute the three smallest eigenvalues and the corresponding eigenvectors of the 
matrix A in Example 9.5. 
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Solution 


% Example 9.14 (Eigenvalue problem) 
global C D 
m = 3 ; 

A = [11 2 3 1 4; 

29352; 

3 3 15 4 3; 

1 5 4 12 4; 

4 2 3 417]; 

eVecMat = zeros(size(A,1),m); 

A = householder(A); 

D = diag(A); C = diag(A,l); 

P = householderP(A); 
eVals = eigenvals3(C,D,m); 
for i = l:m 

s = eVals(i)*l.0000001; 
[eVal,eVec] = invPower3(C,D,s); 
eVecMat(:,i) = eVec; 


end 



eVecMat = P* 

eVecMat; 


eigenvalues 

= eVals’ 


eigenvectors 

= eVecMat 


» eigenvalues = 


4.8739 

8.6636 

10.9368 

eigenvectors 

= 


-0.2673 

0.7291 

0.5058 

0.7414 

0.4139 

-0.3188 

0.0502 

-0.4299 

0.5208 

-0.5949 

0.0696 

-0.6029 

0.1497 

-0.3278 

-0.0884 


% Init. eigenvector matrix. 

% Tridiagonalize A. 

% Extract diagonals of A. 

% Compute tranf. matrix P. 

% Find lowest m eigenvals. 

% Compute corresponding 
% eigenvectors by inverse 

% power method with 

% eigenvalue shifting. 

% Eigenvectors of orig. A. 


PROBLEM SET 9.2 

1. Use Gerschgorin’s theorem to determine global bounds on the eigenvalues of 


" 10 4 -f 


'42-2 

4 2 3 

(b) B = 

2 5 3 

-13 6 


-1 

co 

CN 

1 

_1 


(a) A = 
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2. Use the Sturm sequence to show that 


A = 


5 

-2 

0 

0 


-2 

4 

-1 

0 


0 

-1 

4 

-2 


0 

0 

-2 

5 


has one eigenvalue in the interval (2,4). 

3. Bracket each eigenvalue of 


A = 


4 

-1 

0 


-1 

4 

-1 


0 

-1 

4 


4. Bracket each eigenvalue of 


A = 


6 1 
1 8 
0 2 


0 

2 

9 


5. Bracket every eigenvalue of 


A = 


2 

-1 

0 

0 


-1 

2 

-1 

0 


0 

-1 

2 

-1 


0 

0 

-1 

1 


6 . Tridiagonalize the matrix 


A = 


12 4 
4 9 
3 3 


3 

3 

15 


with Householder’s reduction. 

7. Use Householder’s reduction to transform the matrix 


A = 


4 

-2 

1 

-1 


-2 

4 

-2 

1 


1 

-2 

4 

-2 


-1 

1 

-2 

4 


to tridiagonal form. 
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8 . ■ Compute all the eigenvalues of 


A = 


'6 

2 

0 

0 

0 


2 

5 

2 

0 

0 


0 

2 

7 

4 

0 


0 O' 
0 0 
4 0 
6 1 
1 3 


9. ■ Find the smallest two eigenvalues of 

'4-1 0 1' 

-1 6-20 

A = 

0-2 3 2 

10 2 4 


10. ■ Compute the three smallest eigenvalues of 


7 

-4 

3 

-2 

1 

0 

-4 

8 

-4 

3 

-2 

1 

3 

-4 

9 

-4 

3 

-2 

-2 

3 

-4 

10 

-4 

3 

1 

-2 

3 

-4 

11 

-4 

0 

1 

-2 

3 

-4 

12 


and the corresponding eigenvectors. 

11. ■ Find the two smallest eigenvalues of the 6x6 Hilbert matrix 



' 1 

1/2 

1/3 •• 

• 1/6 - 


1/2 

1/3 

1/4 

• 1/7 

A = 

1/3 

1/4 

1/5 •• 

• 1/8 


.1/6 

1/7 

1/8 ■■ 

• 1/H. 


Recall that this matrix is ill-conditioned. 

12. ■ Rewrite the function eValBrackets so that it will bracket the m largest 
eigenvalues of a tridiagonal matrix. Use this function to bracket the two largest 
eigenvalues of the Hilbert matrix in Prob. 11. 
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The differential equations of motion of the mass-spring system are 

k{—2u\ + u 2 ) = mill 


k{ui — 2 u 2 + m 3 ) = 3 mii 2 
k{u 2 - 2 m 3 ) = 2 mu 3 


where w,-(f) is the displacement of mass i from its equilibrium position and k is 
the spring stiffness. Substituting m,- (/) = y t sin cot, we obtain the matrix eigenvalue 
problem 


2 

-1 

0 


1 o' 


y 1 

2 -1 


T2 

1 2 




'1 0 o' 


yi 

0 3 0 


yi 

0 0 2 


_T3_ 


Determine the circular frequencies a> and the corresponding relative amplitudes 
yt of vibration. 



The figure shows n identical masses connected by springs of different stiffnesses. 
The equation governing free vibration of the system is Au = mco 2 u, where to is the 
circular frequency and 


k\ + k 2 —k 2 


— k 2 k 2 + fc 3 

0 —k 2 


0 

— fa 

fc 3 + fcj 


0 

0 

-*4 


0 

0 

0 


0 

0 


0 k n -1 k n — 1 + k/2 k n 

0 0 —k„ k n 


Giventhe spring stiffness array k = | k\ k 2 ■■■ k n J , write a program that computes 
the AT lowest eigenvalues k = mu> 2 and the corresponding eigenvectors. Run the 
program with N = 4 and 


k= 400 400 400 0.2 400 400 200 kN/m 


Note that the system is weakly coupled, /c, being small. Do the results make sense? 


1 2 


L 


3 —x 

n 


15 . ■ 
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The differential equation of motion of the axially vibrating bar is 

„ P .. 
u = —u 
E 

where u{x, f) is the axial displacement, p represents the mass density and E is the 
modulus of elasticity. The boundary conditions are u{ 0, t) = u'{L, t) = 0. Letting 
u{x, t) = y(x) sin cot, we obtain 

y" = -« 2 |y y(0) = y'[L) = 0 

The corresponding finite difference equations are 


2 

-1 

0 

0 ••• 

o' 


yi 


yi 

-1 

2 

-1 

0 ••• 

0 


T2 


T2 

0 

-1 

2 

_1 ... 
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T3 

{p 

T3 








"V n ) E 


0 

0 


-1 2 

-1 


y n ~ 1 


y n -1 

0 

0 


0 -1 

1 


. yn _ 


_ yn/'Z _ 


(a) If the standard form of these equations is Hz = Xz, write down H and the 
transformation matrix P in y = Pz. (b) Compute the lowest circular frequency of 
the bar with n = 10,100 and 1000 utilizing the module inversePower3. Note: the 
analytical solution is w\ = tt y 1 E/p/ (2 L). 

16. ■ 



The simply supported column is resting on an elastic foundation of stiffness k 
(N/m per meter length). An axial force P acts on the column. The differential 
equation and the boundary conditions for the lateral displacement u are 


u (4) H- vi' H- u = 0 


El 


El 


u{ 0) = u"(0) = u{L) = ii'{L) = 0 


Using the mesh shown, the finite difference approximation of these equations is 
(5 + a)U\ — 4 u 2 + U3 = X(2ui — u 2 ) 

—4 Lii T (6 T ct)U2 — 4^3 T 1/4 = A .(—Hi T T W 3 ) 

U\ — 4 u 2 + (6 + a)u 3 — 4 U4 + u 5 = X{—u 2 + 2 u 3 — n 4 ) 
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9.5 Eigenvalues of Symmetric Tridiagonal Matrices 


u„- 3 - 4w„_2 + (6 + a)u„-i - 4 u n = H-u n -2 + 2m„_i - u n ) 


u n - 2 — 4u n -i + (5 + a)u n — \[—u n -1 + 2u n ) 


where 

_kh i _ 1 kL 4 i _ Pfz 2 _ 1 PL 2 

01 ~ ~EI ~ [n+ l) 4 ~eT ~ ~eT ~ («+ 1) 2 “eT 

Write a program that computes the lowest three buckling loads P and the corre¬ 
sponding mode shapes. Run the program with kL A /{EI) = 1000 and n = 25. 

17. ■ Find smallest five eigenvalues of the 20 x 20 matrix 


2 10 0 ■■•0 1 

12 1 0 ■••0 0 

012 1 ■■•00 

0 0 ••■ 1 2 10 

0 0 ••■ 0 1 21 

1 0 ••■ 0 0 12 


Note-, this is a difficult matrix that has many pairs of double eigenvalues. 


MATLAB Functions 

MATLAB’s function for solving eigenvalue problems is eig. Its usage for the standard 
eigenvalue problem Ax = Ax is 

eVals = eig(A) returns the eigenvalues of the matrix A (A can be unsymmetric). 
[X,D] = eig(A) returns the eigenvector matrix X and the diagonal matrix D that 
contains the eigenvalues on its diagonal; that is, eVals = diag(D). 

For the nonstandard form Ax = ABx, the calls are 

eVals = eig(A,B) 

[X,D] = eig(A,B) 

The method of solution is based on Schur’s factorization: PAP r = T, where P 
and T are unitary and triangular matrices, respectively. Schur’s factorization is not 
covered in this text. 








10 Introduction to Optimization 


Find x that minimizes F (x) subject to g(x) = 0, h (x) > 0 


10.1 Introduction 

Optimization is the term often used for minimizing or maximizing a function. It is suf¬ 
ficient to consider the problem of minimization only; maximization of F (x) is achieved 
by simply minimizing - F (x). In engineering, optimization is closely related to design. 
The function F (x), called the merit function or objective function, is the quantity that 
we wish to keep as small as possible, such as cost or weight. The components of x, 
known as the design variables, are the quantities that we are free to adjust. Physical 
dimensions (lengths, areas, angles, etc.) are common examples of design variables. 

Optimization is a large topic with many books dedicated to it. The best we can do in 
limited space is to introduce a few basic methods that are good enough for problems 
that are reasonably well behaved and don’t involve too many design variables. By 
omitting the more sophisticated methods, we may actually not miss all that much. 
All optimization algorithms are unreliable to a degree—any one of them may work on 
one problem and fail on another. As a rule of thumb, by going up in sophistication we 
gain computational efficiency, but not necessarily reliability. 

The algorithms for minimization are iterative procedures that require starting 
values of the design variables x. If F (x) has several local minima, the initial choice of 
x determines which of these will be computed. There is no guaranteed way of finding 
the global optimal point. One suggested procedure is to make several computer runs 
using different starting points and pick the best result. 

More often than not, the design is also subjected to restrictions, or constraints, 
which may have the form of equalities or inequalities. As an example, take the min¬ 
imum weight design of a roof truss that has to carry a certain loading. Assume that 
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the layout of the members is given, so that the design variables are the cross-sectional 
areas of the members. Here the design is dominated by inequality constraints that 
consist of prescribed upper limits on the stresses and possibly the displacements. 

The majority of available methods are designed for unconstrained optimization, 
where no restrictions are placed on the design variables. In these problems the min¬ 
ima, if they exit, are stationary points (points where gradient vector of F (x) vanishes). 
In the more difficult problem of constrained optimization the minima are usually lo¬ 
cated where the F (x) surface meets the constraints. There are special algorithms for 
constrained optimization, but they are not easily accessible due to their complexity 
and specialization. One way to tackle a problem with constraints is to use an uncon¬ 
strained optimization algorithm, but modify the merit function so that any violation 
of constraints is heavily penalized. 

Consider the problem of minimizing F (x) where the design variables are subject 
to the constraints 


g,-(x) = 0, i = 1, 2,..., M (10.1a) 

hj(x) < 0, y=l,2, ...,1V (10.1b) 

We choose the new merit function be 

P*(x) = P(x) + XP(x) (10.2a) 


P(x) = Y \gdx)] 2 + Y { max [0, hj(x)]} 2 (10.2b) 

i =1 7=1 

is the penalty function and X is a multiplier. The function max (a. b ) returns the larger 
of a and b. It is evident that P(x) =0 if no constraints are violated. Violation of a 
constraint imposes a penalty proportional to the square of the violation. Hence the 
minimization algorithm tends to avoid the violations, the degree of avoidance being 
dependent on the magnitude of X. If X is small, optimization will proceed faster be¬ 
cause there is more “space” in which the procedure can operate, but there may be 
significant violation of constraints. On the other hand, a large X can result in a poorly 
conditioned procedure, but the constraints will be tightly enforced. It is advisable to 
run the optimization program with X that is on the small side. If the results show un¬ 
acceptable constraint violation, increase X and run the program again, starting with 
the results of the previous run. 

An optimization procedure may also become ill-conditioned when the con¬ 
straints have widely different magnitudes. This problem can be alleviated by scaling 
the offending constraints; that is, multiplying the constraint equations by suitable 
constants. 
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10.2 Minimization Along a Line 



Figure 10.1. Example of local and global minima. 


Consider the problem of minimizing a function /(x) of a single variable x with the 
constraints c < x < d. A hypothetical plot of the function is shown in Fig. 10.1. There 
are two minimum points: a stationary point characterized by /'(x) = 0 that represents 
a local minimum, and a global minimum at the constraint boundary. It appears that 
finding the global minimum is simple. All the stationary points could be located by 
finding the roots of df/dx = 0, and each constraint boundary may be checked for a 
global minimum by evaluating /(c) and [(d). Then why do we need an optimization 
algorithm? We need it if /(x) is difficult or impossible to differentiate; for example, if 
/ represents a complex computer algorithm. 


Bracketing 

Before a minimization algorithm can be entered, the minimum point must be brack¬ 
eted. The procedure of bracketing is simple: start with an initial value of Xq and move 
downhill computing the function at X\, x 2 , x 3 ,... until we reach the point x n where 
f[x ) increases for the first time. The minimum point is now bracketed in the inter¬ 
val (x„_ 2 , x n ). What should the step size hi = x,-+i — x, be? It is not a good idea have 
a constant hi since it often results in too many steps. A more efficient scheme is to 
increase the size with every step, the goal being to reach the minimum quickly, even 
if the resulting bracket is wide. We chose to increase the step size by a constant factor; 
that is, we use h, + \ = chi, c > 1. 


Golden Section Search 

The golden section search is the counterpart of bisection used in finding roots of 
equations. Suppose that the minimum of /(x) has been bracketed in the interval 
[a, h) of length h . To telescope the interval, we evaluate the function at X\ = h — Rh 
and x 2 = a + Rh, as shown in Fig. 10.2(a). The constant R will be determined shortly. 
If fi > f 2 as indicated in the figure, the minimum lies in (xi, h); otherwise it is located 
in (a, x 2 ). 
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(a) 


Figure 10.2. Golden section telescoping. 
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Assuming that/i > f 2 , we seta <— X\ andxi x 2 , which yields a new interval (a. b] of 
length h' = Rh, as illustrated in Fig. 10.2(b).Tocarryoutthenexttelescopingoperation 
we evaluate the function at x 2 = a + Rh ' and repeat the process. 

The procedure works only if Figs. 10.1(a) and (b) are similar; i.e., if the same 
constant R locates x\ and x 2 in both figures. Referring to Fig. 10.2(a), we note that 
x 2 — X\ = 2Rh — h. The same distance in Fig. 10.2(b) is X\ - a = h' - Rh'. Equating 
the two, we get 


2 Rh-h=h' - Rh’ 


Substituting h’ = Rh and cancelling h yields 


2R — 1 = f?(l — R) 

the solution of which is the golden ratio 21 : 

-1 + 75 

R= -—=0.618 033 989... (10.3) 

2 

Note that each telescoping decreases the interval containing the minimum by the 
factor R, which is not as good as the factor of 0.5 in bisection. Flowever, the golden 
search method achieves this reduction with one function evaluation, whereas two 
evaluations would be needed in bisection. 

The number of telescopings required to reduce h from \h — a | Lo an error tolerance 
e is given by 


\b-a\R n = e 


21 R is the ratio of the sides of a “golden rectangle,” considered by ancient Greeks to have the perfect 
proportions. 
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which yields 


ln(e/ | b— a\) 

\nJ{ 


= -2.078 087 In 


\b— a | 


£ 


(10.4) 


■ goldBracket 

This function contains the bracketing algorithm. For the factor that multiplies suc¬ 
cessive search intervals we chose c = 1 + R. 

function [a,b] = goldBracket(func,xl,h) 

% Brackets the minimum point of f(x). 

% USAGE: [a,b] = goldBracket(func,xStart,h) 

% INPUT: 

% func = handle of function that returns f(x). 

% xl = starting value of x. 

% h = initial step size used in search. 

% OUTPUT: 

% a, b = limits on x at the minimum point. 

c = 1.618033989; 

fl = feval(func,xl); 

x2 = xl + h; f2 = feval(func,x2); 

% Determine downhill direction & change sign of h if needed, 
if f2 > fl 
h = -h; 

x2 = xl + h; f2 = feval(func,x2); 

% Check if minimum is between xl - h and xl + h 
if f2 > fl 


a = x2; b = xl - h; return 


end 


end 

% Search loop 
for i = 1:100 
h = c*h; 

x3 = x2 + h; f3 = feval(func,x3); 
if f3 > f2 

a = xl; b = x3; return 

end 

xl = x2; fl = f2; x2 = x3; f2 = f3; 


end 

error(’goldbracket did not find minimum’) 
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■ goldSearch 

This function implements the golden section search algorithm. 

function [xMin,fMin] = goldSearch(func,a,b,tol) 

% Golden section search for the minimum of f(x). 

% The minimum point must be bracketed in a <= x <= b. 

% USAGE: [fMin,xMin] = goldSearch(func,xStart,h) 

% INPUT: 

% func = handle of function that returns f(x). 

% a, b = limits of the interval containing the minimum. 

% tol = error tolerance (default is 1.0e-6). 

% OUTPUT: 

% fMin = minimum value of f(x). 

% xMin = value of x at the minimum point. 

if nargin < 4; tol = 1.0e-6; end 

niter = ceil(-2.078087*log(tol/abs(b-a))); 

R = 0.618033989; 

C = 1.0 - R; 

% First telescoping 
xl = R*a + C*b; 

x2 = C*a + R*b ; 

fl = feval(func,xl); 
f2 = feval(func,x2); 

% Main loop 
for i =1:niter 
if fl > f2 

a = xl; xl = x2; f1 = f2 ; 
x2 = C*a + R*b; 
f2 = feval(func,x2); 

else 

b = x2; x2 = xl; f2 = f1; 
xl = R*a + C*b; 
fl = feval(func,xl); 

end 

end 

if f1 < f2; fMin = f1; xMin = xl; 

else; fMin = f2; xMin = x2; 

end 
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EXAMPLE 10.1 

Use goldSearch to find x that minimizes 

fix) = 1.6x 3 + 3x 2 — 2x 

subject to the constraint x > 0. Compare the result with the analytical solution. 

Solution This is a constrained minimization problem. Either the minimum of fix) is 
a stationary point in x > 0, or it is located at the constraint boundary x = 0. We handle 
the constraint with the penalty function method by minimizing fix) + X [min(0, x)] 2 . 

Starting at x = 1 and choosing h = 0.1 for the first step sizeingoldBracket (both 
choices being rather arbitrary), we arrive at the following program: 

% Example 10.1 (golden section minimization) 
x = 1.0; h = 0.1; 

[a,b] = goldBracket(@fexl0_1,x,h); 

[xMin,fMin] = goldSearch(@fexlO_1,a,b) 

The function to be minimized is 

function y = fexl0_l(x) 

% Function used in Example 10.1. 
lam =1.0; % Penalty function multiplier 

c = min(0.0,x); % Constraint penalty equation 

y = 1.6*x~3 + 3.0*x"2 - 2.0*x + lam*c~2; 

The output from the program is 

» xMin = 

0.2735 
fMin = 

-0.2899 

Since the minimum was found to be a stationary point, the constraint was not 
active. Therefore, the penalty function was superfluous, but we did not know that at 
the beginning. 

The locations of stationary points are obtained analytically by solving 
fix) = 4.8x 2 + 6x - 2 = 0 

The positive root of this equation is x = 0.273 494. As this is the only positive root, 
there are no other stationary points in x > 0 that we must check out. The only other 
possible location of a minimum is the constraint boundary x = 0. But here /(0) = 0 
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is larger than the function at the stationary point, leading to the conclusion that the 
global minimum occurs at x = 0.273 494. 

EXAMPLE 10.2 



The trapezoid shown is the cross section of a beam. It is formed by removing the top 
from a triangle of base B = 48 mm and height H = 60 mm. The problem is to find the 
height y of the trapezoid that maximizes the section modulus 

S = I x /c 

where I x is the second moment of the cross-sectional area about the axis that passes 
through the centroid C of the cross section. By optimizing the section modulus, 
we minimize the maximum bending stress cr max = M/S in the beam, M being the 
bending moment. 

Solution Considering the area of the trapezoid as a composite of a rectangle and 
two triangles, we find the section modulus through the following sequence of 
computations: 


Base of rectangle 

a = B(H — y) /H 

Base of triangle 

b= (B — a) / 2 

Area 

A = (B + a) y/2 

First moment of area about x-axis 

Qx = ( ay ) y/2 + 2 (by/2) y/3 

Location of centroid 

d = Q x /A 

Distance involved in S 

c = y-d 

Second moment of area about x-axis 

I x = ay 3 / 3 + 2 (by 3 / 12) 

Parallel axis theorem 

CM 

1 

II 

Section modulus 

S = h/c 
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We could use the formulas in the table to derive S as an explicit function of y, but 
that would involve a lot of error-prone algebra and result in an overly complicated 
expression. It makes more sense to let the computer do the work. 

The program we used is listed below. As we wish to maximize S with a minimization 
algorithm, the merit function is — S. There are no constraints in this problem. 

% Example 10.2 (root finding with golden section) 
yStart = 60.0; h = 1.0; 

[a,b] = goldBracket(@fexl0_2,yStart,h); 

[yopt,Sopt] = goldSearch(@fexlO_2,a,b); 
fprintf(’optimal y = %7. 4f\n’,yopt) 
fprintf(’optimal S = %7. 2f’,-Sopt) 

The function that computes the section modulus is 

function S = fexl0_2(y) 

% Function used in Example 10.2 
B = 48.0; H = 60.0; 
a = B*(H - y)/H; b = (B - a)/2.0; 

A = (B + a) *y/2.0 ; 

Q = (a*y~ 2)/2.0 + (b*y~2)/3.0; 
d = Q/A; c = y - d; 

I = (a*y~ 3)/3.0 + (b*y~3)/6.0; 

Ibar = I - A*d"2; S = -Ibar/c 

Here is the output: 

optimal y = 52.1763 
optimal S = 7864.43 

The section modulus of the original triangle is 7200; thus the optimal section 
modulus is a 9.2% improvement over the triangle. 


10.3 Conjugate Gradient Methods 
Introduction 

We now look at optimization in n-dimensional design space. The objective is to min¬ 
imize F (x), where the components of x are the n independent design variables. One 
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way to tackle the problem is to use a succession of one-dimensional minimizations 
to close in on the optimal point. The basic strategy is 

• Choose a point xo in the design space. 

• loop with i = 1, 2 ,3,... 

Choose a vector v,-. 

Minimize F (x) along the line through x,_ i in the direction of v,. Let the minimum 
point hex,. 

if I x; - x/_, | < £ exit loop 

• end loop 

The minimization along a line can be accomplished with any one-dimensional 
optimization algorithm (such as the golden section search). The only question left 
open is how to choose the vectors v,-. 


Conjugate Directions 

Consider the quadratic function 

F (x) — c ^ ^ bi Xf -I - — 

i i j 

= c- b r x+ ^x r Ax (10.5) 

Differentiation with respect to x z - yields 

~dX t = ~ bi + ^ AijX] 

which can be written in vector notation as 


Vi 7 = —b + Ax 


( 10 . 6 ) 


where VF is the gradient of F. 

Now consider the change in the gradient as we move from point xo in the direction 
of a vector u. The motion takes place along the line 

X = Xo + su 

where s is the distance moved. Substitution into Eq. (10.6) yields the expression for 
the gradient along u: 

VFI^^ = -b + A(xo + su) = VFI^ + sAu 
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Note that the change in the gradient is 5 Au. If this change is perpendicular to a vector 
v; that is, if 


v r Au = 0 (10.7) 

the directions of u and v are said to be mutually conjugate (noninterfering). The 
implication is that once we have minimized F (x) in the direction of v, we can move 
along u without ruining the previous minimization. 

For a quadratic function of n independent variables it is possible to construct n 
mutually conjugate directions. Therefore, it would take precisely n line minimizations 
along these directions to reach the minimum point. If F (x) is not a quadratic function, 
Eq. (10.5) can be treated as a local approximation of the merit function, obtained by 
truncating the Taylor series expansion of F (x) about xo (see Appendix Al): 

F(x) « F{xo) + VF(xo)(x- xo) + ^(x- xo) r H(xo)(x - xo) 

Now the conjugate directions based on the quadratic form are only approximations, 
valid in the close vicinity of xo. Consequently, it would take several cycles of n line 
minimizations to reach the optimal point. 

The various conjugate gradient methods use different techniques for constructing 
conjugate directions. The so-called zero-order methods work with F (x) only, whereas 
the first-order methods utilize both F(x) and VT. The first-order methods are com¬ 
putationally more efficient, of course, but the input of VF (if it is available at all) can 
be very tedious. 

Powell's Method 

Powell’s method is a zero-order method, requiring the evaluation of F(x) only. If the 
problem involves n design variables, the basic algorithm is 

• Choose a point xo in the design space. 

• Choose the starting vectors v,-, i = 1.2,..., n (the usual choice is v, = e, , where e, 
is the unit vector in the jq-coordinate direction). 

• cycle 

do with i = 1,2, ..., n 

Minimize F (x) along the line through x,_i in the direction of v,-. Let the 
minimum point be x*. 

end do 

v„_i «- xo — x„ (this vector is conjugate to v„ + i produced in the previous loop) 
Minimize F (x) along the line thro ugh xo in the direction ofv„ + i. Let the minimum 
point be x„ + i. 
if |x„ + i — xol < e exit loop 
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do with i = 1,2,... ,n 

V; <- Vj +] (i/| is discarded, the other vectors are reused) 
end do 
• end cycle 

Powell demonstrated that the vectors v„ +l produced in successive cycles are mu¬ 
tually conjugate, so that the minimum point of a quadratic surface is reached in 
precisely n cycles. In practice, the merit function is seldom quadratic, but as long as 
it can be approximated locally by Eq. (10.5), Powell’s method will work. Of course, it 
usually takes more than n cycles to arrive at the minimum of a nonquadratic function. 
Note that it takes n line minimizations to construct each conjugate direction. 

Figure 10.3(a) illustrates one typical cycle of the method in a two dimensional 
design space [n = 2). We start with point xo and vectors vi and v 2 . Then we find the 
distance Si that minimizes F(x o + .s'V]), finishing up at point xi = xo + .s'i vi. Next, we 
determine s 2 that minimizes / ; (xi + sv 2 ), which takes us to x 2 = xi + ,s' 2 v 2 . The last 
search direction is v 3 = x 2 — xo. After finding s 3 by minimizing F(x o + sv 3 ) we get to 
x 3 = xo + s 3 v 3 , completing the cycle. 



Figure 10.3(b) shows the moves carried out in two cycles superimposed on the 
contour map of a quadratic surface. As explained before, the first cycle starts at point 
Po and ends up at P 3 . The second cycle takes us to P6, which is the optimal point. The 
directions PoP 3 and P 3 P6 are mutually conjugate. 

Powell’s method does have a major flaw that has to be remedied—if P(x) is not 
a quadratic, the algorithm tends to produce search directions that gradually become 
linearly dependent, thereby ruining the progress towards the minimum. The source 
of the problem is the automatic discarding of v 3 at the end of each cycle. It has been 
suggested that it is better to throw out the direction that resulted in the largest decrease 
of F(x), a policy that we adopt. It seems counterintuitive to discard the best direction, 
but it is likely to be close to the direction added in the next cycle, thereby contributing 
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to linear dependence. As a result of the change, the search directions cease to be 
mutually conjugate, so that a quadratic form is not minimized in n cycles any more. 
This is not a significant loss since in practice F (x) is seldom a quadratic anyway. 

Powell suggested a few other refinements to speed up convergence. Since they 
complicate the bookkeeping considerably, we did not implement them. 


■ powell 

The algorithm for Powell’s method is listed below. It utilizes two arrays: df contains 
the decreases of the merit function in the first n moves of a cycle, and the matrix u 
stores the corresponding direction vectors v ; (one vector per column). 

function [xMin,fMin,nCyc] = powell(h,tol) 

% Powell’s method for minimizing f(xl,x2.xn). 

% USAGE: [xMin,fMin,nCyc] = powell(h,tol) 

% INPUT: 

% h = initial search increment (default = 0.1). 

% tol = error tolerance (default = 1.0e-6). 

% GLOBALS (must be declared GLOBAL in calling program): 

% X = starting point 

% FUNC = handle of function that returns f. 

% OUTPUT: 

% xMin = minimum point 

% fMin = miminum value of f 

% nCyc = number of cycles to convergence 


global X FUNC V 


if nargin < 2; tol = 1.0e-6; end 
if nargin < 1; h = 0.1; end 
if size(X,2) >1; X = X’; end 
n = length(X); 
df = zeros(n,1) 
u = eye(n); 
for j = 1:30 
xOld = X; 


% X must be column vector 
% Number of design variables 
% Decreases of f stored here 
% Columns of u store search directions V 
% Allow up to 30 cycles 


fold = feval(FUNC,xOld); 

% First n line searches record the decrease of f 
for i = l:n 

V = u(l:n,i); 

[a,b] = goldBracket(@fLine,0.0,h); 
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[s,fMin] = goldSearch(@fLine,a,b); 
df(i) = fOld - fMin; 
fOld = fMin; 

X = X + s*V; 

end 

% Last line search in the cycle 
V = X - xOld; 

[a,b] = goldBracket(@fLine,0.0,h); 

[s,fMin] = goldSearch(@fLine,a,b); 

X = X + s*V; 

% Check for convergence 
if sqrt(dot(X-xOld,X-x01d)/n) < tol 
xMin = X; nCyc = j; return 

end 

% Identify biggest decrease of f & update search 

% directions 

iMax = 1; dfMax = df(l); 

for i = 2:n 

if df(i) > dfMax 

iMax = i; dfMax = df(i); 

end 

end 

for i = iMax:n-l 

u(l:n,i) = u(l:n,i+l); 

end 

u(1:n,n) = V; 

end 

error(’Powell method did not converge’) 

function z = fLine(s) % F in the search direction V 
global X FUNC V 
z = feval(FUNC,X+s*V); 

EXAMPLE 10.3 

Find the minimum of the function 22 

F= 100(y — x 2 ) 2 + (1 — jc) 2 


22 FromShoup, T. E., andMistree, F., Optimization Methods with Applications for Personal Computers, 
Prentice-Hall, 1987. 
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with Powell’s method starting at the point (—1,1). This function has an interesting 
topology. The minimum value of F occurs at the point (1,1). As seen in the figure, 
there is a hump between the starting and minimum points which the algorithm must 
negotiate. 



Solution The program that solves this unconstrained optimization problem is 


% Example 10.3 (Powell’s method of minimization) 
global X FUNC 
FUNC = @fexl0_3; 

X = [-1.0; 1.0]; 

[xMin,fMin,numCycles] = powell 


Note that powell receives X and the function handle FUNC as global variables. 
The routine for the function to be minimized is 

function y = fexl0_3(X) 

y = 100.0*(X(2) - X(l)~2)~2 + (1.0 -X(l))~2; 

ffere are the results: 


» xMin = 
1.0000 
1.0000 
fMin = 

1.0072e-024 
numCycles = 

12 
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EXAMPLE 10.4 

Use powell to determine the smallest distance from the point (5, 8) to the curve 
xy= 5. 

Solution This is a constrained optimization problem: minimize F{x, y) = (x — 5) 2 + 
(y — 8) 2 (the square of the distance) subject to the equality constraint xy — 5 = 0. The 
following program uses Powell’s method with penalty function: 

% Example 10.4 (Powell’s method of minimization) 
global X FUNC 
FUNC = @fexl0_4; 

X = [1.0; 5.0]; 

[xMin,fMin,nCyc] = powell; 

fprintf(’Intersection point = %8.5f %8 .5f\n',X(l),X(2)) 
xy = X(l)*X( 2); 

fprintf('Constraint x*y = %8.5f\n’,xy) 
dist = sqrt((X(l) - 5.0)'2 + (X(2) - 8.0)~2); 
fprintf(’Distance = %8.5f\n’,dist) 
fprintf(’Number of cycles = %2.0f’,nCyc) 

The penalty is incorporated in the M-file of the function to be minimized: 

function y = fexlO_4(X) 

% Function used in Example 10.4 

lam =1.0; % Penalty multiplier 

c = X(1)*X(2) - 5.0; % Constraint equation 

distSq = (X(l) - 5.0)~ 2 + (X(2) - 8,0)"2; 
y = distSq + lam*c"2; 

As mentioned before, the value of the penalty function multiplier k (called lam 
in the program) can have profound effects on the result. We chose k = 1 (as shown in 
the listing of f exl0_4) with the following result: 

» Intersection point = 0.73307 7.58776 

Constraint x*y = 5.56234 

Distance = 4.28680 

Number of cycles = 7 

The small value of k favored speed of convergence over accuracy. Since the viola¬ 
tion of the constraint xy = 5 is clearly unacceptable, we ran the program again with 
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X = 10 000 and changed the starting point to (0.733 07, 7.587 76), the end point of the 
first run. The results shown below are now acceptable. 

»Intersection point = 0.65561 7.62654 

Constraint x*y = 5.00006 

Distance = 4.36041 

Number of cycles = 4 

Could we have used X = 10 000 in the first run? In this case we would be lucky and 
obtain the minimum in 17 cycles. Hence we save only six cycles by using two runs. 
However, a large X often causes the algorithm to hang up, so that it generally wise to 
start with a small X. 


Fletcher-Reeves Method 


Let us assume again that the merit function has the quadratic form in Eq. (10.5). Given 
a direction v, it took Powell’s method n line minimizations to construct a conjugate 
direction. We can reduce this to a single line minimization with a first-order method. 
Here is the procedure, known as the Fletcher-Reeves method: 


• Choose a starting point xo. 

• go «-V -F(xo) 

• v 0 <— go (lacking a previous search direction, we choose the steepest descent). 

• loop with i = 0 , 1 , 2 ,... 

Minimize F (x) along v,; let the minimum point be x,- + i. 
gi+1 -VP(X;+i). 

if |g,:+i| < £ or|F(x !+ i) - F(x;)| < £ exit loop (convergence criterion). 

y * (g/+i ■ gi+ 1 1 /igi ■ g/)• 

v i+l gi+1 + / v i- 

• end loop 


ft can be shown that v, and v !+ i are mutually conjugate; that is, they satisfy the 
relationship vf Av i+ i = 0 . Also g, : • g,+i = 0. 

The Fletcher-Reeves method will find the minimum of a quadratic function in 
n iterations. If F (x) is not quadratic, it is necessary to restart the process after every 
n iterations. A variant of the Fletcher-Reeves method replaces the expression for y by 


(g/+i g/) ■ gi -1 

g ■ g / 


( 10 . 6 ) 


For a quadratic F (x) this change makes no difference since g, and g )+ i are orthogonal. 
However, for merit functions that are not quadratic, Eq. (10.6) is claimed to eliminate 
the need for a restart after n iterations. 





399 


10.3 Conjugate Gradient Methods 


■ fletcherReeves 

function [xMin,fMin,nCyc] = fletcherReeves(h,tol) 

% Fletcher-Reeves method for minimizing f(xl,x2,...,xn). 

% USAGE: [xMin,fMin,nCyc] = fletcherReeves(h,tol) 

% INPUT: 

% h = initial search increment (default = 0.1). 

% tol = error tolerance (default = 1.0e-6). 

% GLOBALS (must be declared GLOBAL in calling program): 

% X = starting point. 

% FUNC = handle of function that returns F. 

% DFUNC = handle of function that returns grad(F), 

% OUTPUT: 

% xMin = minimum point. 

% fMin = miminum value of f. 

% nCyc = number of cycles to convergence. 

global X FUNC DFUNC V 

if nargin < 2; tol = 1.0e-6; end 

if nargin < 1; h = 0.1; end 

if size(X,2) > 1; X = X’; end % X must be column vector 
n = length(X); % Number of design variables 

gO = -feval(DFUNC,X); 

V = gO; 
for i = 1:50 

[a,b] = goldBracket(@fLine,0.0,h); 

[s,fMin] = goldSearch(@fLine,a,b); 

X = X + s*V; 

gl = -feval(DFUNC,X) ; 

if sqrt(dot(gl,gl)) <= tol 

xMin = X; nCyc = i; return 

end 

gamma = dot((gl - gO),gl)/dot(g0,gO); 

V = gl + gamma*V; 
gO = gl; 

end 

error('Fletcher-Reeves method did not converge’) 

function z = fLine(s) % F in the search direction V 
global X FUNC V 
z = feval(FUNC,X+s*V); 
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EXAMPLE 10.5 

Use the Fletcher-Reeves method to locate the minimum of 
F(x) = 10x 2 + 3xl — 10 jti;t 2 + 2xi 

Start with xo = 0 0 j . 

Solution Since F(x) is quadratic, we need only two iterations. The gradient of F is 

20xi — IOX 2 + 2 

VF(x) = 

-10xi + 6x 2 

First iteration: 



-2 


-2 


-2s 

go = —VF(x0) = 

0 

Vo = go = 

0 

Xo + SVo = 

0 


f(s) = F{x 0 + sv 0 ) = 10(2s) 2 + 3(0) 2 - 10(-2s)(0) + 2(—2s) 
= 40s 2 — 4s 


/'(s) = 80s - 4 = 0 s = 0.05 


Xi = Xo + SVo 



+ 0.05 



- 0.1 

0 


Second iteration: 


gi = -VF(xi) 


~ — 201 — 0 . 1 ) + 10 ( 0 ) — 2 " 


0 

101 — 0 . 1 ) — 6 ( 0 ) 


- 1.0 


gi ■ gi 10 

y = - = -= 0.25 

go-go 4 


vi = gi + yv 0 


0 

- 1.0 


+ 0.25 



-0.5 

- 1.0 


Xi + SVi = 


- 0.1 

0 


+ s 


-0.5 

- 1.0 


-0.1 - 0.5s 
-s 


/(s) = F(xi +svi) 

= 101—0.1 - 0.5s) 2 + 3(—s) 2 - 101—0.1 - 0.5s) (—s) + 21—0.1 - 0.5s) 
= 0.5s 2 -s-0.1 
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f(s) = s-1 = 0 s = 1.0 


-0.5 
- 1.0 

We have now reached the minimum point. 


X 2 = Xi + SVi = 


- 0.1 

0 


+ 1.0 


- 0.6 

- 1.0 


EXAMPLE 10.6 



The figure shows the cross section of a channel carrying water. Determine h, b 
and 0 that minimize the length of the wetted perimeter while maintaining a cross- 
sectional area of 8 m 2 . (Minimizing the wetted perimeter results in least resistance to 
the flow.) Use the Fletcher-Reeves mathod. 

Solution The cross-sectional area of the channel is 

A= _ [b+ (b+ 2htan9)] h = (b+ htan8)h 

and the length of the wetted perimeter is 

S = b+ 2{hsec8) 

The optimization problem can be cast as 

minimize b+2h sec 9 
subject to (b+ htan8)h = 8 

Equality constraints can often be used to eliminate some of the design variables. 
In this case we can solve the area constraint for b, obtaining 

g 

b = -— h tan# 
h 

Substituting the result into the expression for S, we get 

g 

S = -— h tan# + 2hsec9 
h 

We have now arrived at an unconstrained optimization problem of finding h and 8 
that minimize S. The gradient of the merit function is 


dS/dh 


—8/ h 2 — tan 9 + 2 seed 

dS/dd 


—hsec 2 8 + 2hsec8 tan 8 


VS = 
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Letting x = yh 6 J and starting with xo = y 2 0 J , we arrive at the following 
program: 


% Example 10.6 (Minimization with Fletcher-Reeves) 

global X FUNC DFUNC 

FUNC = @fexl0_ 6; DFUNC = @dfexl0_6; 

X = [ 2.0 ; 0.0] ; 

[xMin,fMin,nCyc] = fletcherReeves; 
b = 8.0/XC1) - X(l)*tan(X(2)); 

theta = X(2)*180.0/pi; % Convert into degrees 

fprintf(’b = %8.5f\n’,b) 

fprintf(’h = %8 .5f\n’,X(l)) 

fprintf(’theta = %8. 5f\ntheta) 

fprintf(’perimeter = %8. 5f\n’,fMin) 

fprintf(’number of cycles = %2.0f’,nCyc) 


Note that the starting point X and the function handles FUNC (function defining 
F ) and DFUNC (function defining V F) are declared global. The M-files for the two 
functions are 


function y = fexlO_6(X) 

% Function defining F in Example 10.6 
y = 8.0/X(l) - X(l)*(tan(X(2)) - 2.0/cos(X(2))); 

function g = dfexl0_6(X) 

% Function defining grad(F) in Example 10.6 
g = zeros(2,1): 

g(l) = -8.0/(X(l)~2) - tan(X(2)) + 2.0/cos(X(2)); 
g(2) = X(l)*(-1.0/cos(X(2)) + 2.0*tan(X(2)))/cos(X(2)); 

The results are (6 is in degrees): 

» b = 2.48161 

h = 2.14914 

theta = 30.00000 
perimeter = 7.44484 

number of cycles = 5 
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PROBLEM SET 10.1 


1. ■ The Lennard-Jones potential between two molecules is 


V= 4 e 


[(- 

r\ 12 

1 

CD 

b7 

[V, 

-/ 

\r/ 


where e and a are constants, and r is the distance between the molecules. Use the 
functions goldBracket and goldSearchto find air that minimizes the potential 
and verify the result analytically. 

2. ■ One wave function of the hydrogen atom is 


f = C (27 - 18er + 2c 2 ) e~ a/3 


where 


a = zr/a,Q 


C = 




2/3 


z = nuclear charge 
a 0 = Bohr radius 
r = radial distance 


Find a where x// is at a minimum. Verify the result analytically. 

3. ■ Determine the parameter p dial minimizes the integral 


sin x cos pxdx 
Hint: use numerical quadrature to evaluate the integral. 


I" 


2fi fl 2 = 3.6 0 


—ww— 

-WWW- 

f, J 

k 


IE 

j 

k 

- MW - 

- MW - 

r 3 = 1.5 n 

R 4 =1.8Q 


Kirchoff’s equations for the two loops of the electrical circuit are 

R\i\ + f?3h + R{i\ — h) = E 
f?2*2 + Rih + R 5 I 2 + R(i 2 — i\) = 0 

Find the resistance R that maximizes the power dissipated by R. Hint: solve 
Kirchoff’s equations numerically with one of the functions in Chapter 2. 
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5. ■ 



A wire carrying an electric current is surrounded by rubber insulation of outer 
radius r. The resistance of the wire generates heat, which is conducted through 
the insulation and convected into the surrounding air. The temperature of the 
wire can be shown to be 


T 


q_ ( \n [r/a) J_\ 

2jt \ k hr) 00 


where 


q = rate of heat generation in wire = 50 W/m 
a = radius of wire = 5 mm 

k = thermal conductivity of rubber = 0.16 W/m • K 
h = convective heat-transfer coefficient = 20 W/m 2 • K 
7^ = ambient temperature = 280 K 

Find r that minimizes T. 

6. ■ Minimize the function 

F(x, y) = {x - if + {y - if 

subject to the constraints x + y < 1 and x > 0.6. 

7. ■ Find the minimum of the function 

F{x, y) = 6x 2 + y 3 + xy 

in y > 0. Verify the result analytically. 

8. ■ Solve Prob. 7 if the constraint is changed to y > —2. 

9. ■ Determine the smallest distance from the point (1, 2) to the parabola y = x 2 . 
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10. ■ 




0.2 m 


1 

X 

t 

0.4 m 

d 

1 



0.4 m 


Determine x that minimizes the distance d between the base of the area shown 
and its centroid C. 



The cylindrical vessel of mass M has its center of gravity at C. The water in 
the vessel has a depth x. Determine x so that the center of gravity of the 
vessel-water combination is as low as possible. Use M= 115 kg, H = 0.8 m and 
r = 0.25 m. 


a 



b 

I 


— 







The sheet of cardboard is folded along the dashed lines to form a box with 
an open top. If the volume of the box is to be 1.0 m 3 , determine the dimen¬ 
sions a and b that would use the least amount of cardboard. Verify the result 
analytically. 
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13. ■ 



The elastic cord ABC has an extensional stiffness k. When the vertical force P is 
applied at B, the cord deforms to the shape AB'C. The potential energy of the 
system in the deformed position is 


V=-Pv + 


k{a+b ) 2 
2ct AB + 


k{a+b] s2 
~^^ 6bc 


where 


S AB = \/(a + u) 2 + v 2 - a 
S B c = y/(b- n) 2 + v 2 - b 

are the elongations of AB and BC. Determine the displacements u and v by min¬ 
imizing V (this is an application of the principle of minimum potential energy: 
a system is in stable equilibrium if its potential energy is at a minimum). Use 
a = 150 mm, b = 50 mm, k = 0.6 N/mm and P = 5 N. 



Each member of the truss has a cross-sectional area A. Find A and the angle 6 
that minimize the volume 

V=^- 

cosd 

of the material in the truss without violating the constraints 


a < 150 MPa 8 < 5 mm 
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where 


S = 


2Asin$ 
Pb 


= stress in each member 

= displacement at the load P 


2EAsm26 sin# 
and E = 200 x 10 9 Pa. 

15. ■ Solve Prob. 14 if the allowable displacement is changed to 2.5 mm. 

16. ■ 

i r. 


£ 


L= 1.0 m 


L = 1.0 m 


P= 10 kN 


The cantilever beam of circular cross section is to have the smallest volume pos¬ 
sible subject to constraints 


where 


ni<180MPa <T 2 :<180MPa S < 25 mm 


8PL ■ . , « 

a i = —— = maximum stress in left hall 
jrr. 


<?2 = 


'i 

4 PL 

71 To 


= maximum stress in right half 


S = 


'2 

4PL 3 ( 7 1 


-I—j ) = displacement at free end 
1 r 2 


and E = 200 GPa. Determine /'i and r 2 . 

17. ■ Find the minimum of the function 


F ( x, y, z) = 2x 2 + 3 y 2 + z 2 + xy + xz - 2y 


and confirm the result analytically. 


18. 



The cylindrical container has a conical bottom and an open top. If the volume V 
of the container is to be 1.0 m 3 , find the dimensions r, h and b that minimize the 
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surface area S. Note that 



S = nr (2h + yjb 2 + r 2 ^ 


19. ■ 



The equilibrium equations of the truss shown are 

4 3 

aiAi +-a 2 A 2 = P -a 2 A 2 + (T3A3 = P 

5 5 

where is the axial stress in member i and A, are the cross-sectional areas. 

The third equation is supplied by compatibility (geometrical constraints on the 
elongations of the members): 

16 9 

—a 1 - 5(72 + -CT3 = 0 
5 5 

Find the cross-sectional areas of the members that minimize the weight of the 
truss without the stresses exceeding 150 MPa. 



A cable supported at the ends carries the weights W\ and W 2 . The potential energy 
of the system is 


V = — W\y\ — W 2 y 2 

= — W\L\ sin@i — W 2 (Li sin$i + L 2 sinS 2 ) 
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and the geometric constraints are 

Li cos 9 1 + L 2 cos 0 2 + L 3 cos d 3 = B 
Li sin#i + L 2 sin0 2 + L 3 sin0 3 = H 

The principle of minimum potential energy states that the equilibrium config¬ 
uration of the system is the one that satisfies geometric constraints and mini¬ 
mizes the potential energy. Determine the equilibrium values of 6\, d 2 and d 3 
given that Li = 1.2 m, L 2 = 1.5 m, L 3 = 1.0 m, B = 3.5 m, H = 0, W\ = 20 kN and 
W 2 = 30 kN. 


MATLAB Functions 

x = fmnbnd(@func, a, b) returns x that minimizes the function func of a single 
variable. The minimum point must be bracketed in ( a, b ). The algorithm used 
is Brent’s method that combines golden section search with quadratic interpo¬ 
lation. It is more efficient than goldSearch that uses just the golden section 
search. 

x = fminsearch(<afunc .xStart) returns the vector of independent variables that 
minimizes the multivariate function func. The vector xStart contains the 
starting values of x. The algorithm is the Nelder-Mead method, also known 
as the downhill simplex, which is reliable, but much less efficient than Powell’s 
method. 

Both of these functions can be called with various control options that set op¬ 
timization parameters (e.g., the error tolerance) and control the display of results. 
There are also additional output parameters that maybe used in the function call, as 
illustrated in the following example (the data is taken from Example 10.4): 

» [x,fmin,output] = fminsearch(@fexl0_4,[1 5]) 

x = 

0.7331 7.5878 

fmin = 

18.6929 
output = 

iterations: 38 
funcCount: 72 

algorithm: 'Nelder-Mead simplex direct search’ 





Appendices 


A1 Taylor Series 

Function of a Single Variable 

The Taylor series expansion of a function f{x) about the point x = a is the infinite 
series 


fix) = f[a) + f'{a){x-a) + f"(a) ^ ^ + f"[a) — ^ - (Al) 


In the special case a = 0 the series is also known as the MacLaurin series. It can be 
shown that the Taylor series expansion is unique in the sense that no two functions 
have identical Taylor series. 

A Taylor series is meaningful only if all the derivatives of fix) exist at x = a and 
the series converges. In general, convergence occurs only if x is sufficiently close to a; 
i.e., if \x — a\ <e, where e is called the radius of convergence. In many cases e is infinite. 

Another useful form of the Taylor series is the expansion about an arbitrary 
value of x: 



(A2) 


Since it is not possible to evaluate all the terms of an infinite series, the effect of 
truncating the series in Eq. (A2) is of great practical importance. Keeping the first 
n+ 1 terms, we have 


fix + h) = fix) + f'ix)h+ /"M^y H-f / w (i)^| + E, 


(A3) 


where E n is the truncation error (sum of the truncated terms). The bounds on the 
truncation error are given by Taylor’s theorem: 



(A4) 
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where f is some point in the interval (x, x + h). Note that the expression for E n is 
identical to the first discarded term of the series, but with x replaced by £. Since the 
value of § is undetermined (only its limits are known), the most we can get out of 
Eq. (A4) are the upper and lower bounds on the truncation error. 

If the expression for / ( " +1) (?) is not available, the information conveyed by 
Eq. (A4) is reduced to 

E n = 0{h n+1 ) (A5) 

which is a concise way of saying that the truncation error is of the order of h" +1 , or 
behaves as h n+1 .If his within the radius of convergence, then 

0{h n ) > 0{h n+l ) 

i.e., the error is always reduced if a term is added to the truncated series (this may not 
be true for the first few terms). 

In the special case n = 1, Taylor’s theorem is known as the mean value theorem : 

f{x + h) = fix ) + f'i%)h, x < f < x + h (A6) 


Function of Several Variables 


If / is a function of the m variables xi, X 2 ,..., x m , then its Taylor series expansion 
about the point x = [x \, x 2 ,..., x m ] T is 


fix + h) 


m r, f -1 m m 

1=1 1 x i=i j=i 


9 2 / 

dXidXj 


hihj + • • • 

X 


(A7) 


This is sometimes written as 

fix + h) = fix) + V fix) ■ h + ^h r H(x)h 4 - (A8) 

The vector V/ is known as the gradient of / and the matrix H is called the Etessian 
matrix of /. 


EXAMPLE A1 

Derive the Taylor series expansion of fix) = ln(x) about x = 1. 

Solution The derivatives of / are 

f'ix)=\ f" (x) = -l f'"lx) = | / (4, = -|etc. 

Evaluating the derivatives at x = 1, we get 

/'(!) = ! /"(!) =-1 f'"i 1) = 2! / l4) ( 1) = —3! etc. 
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which upon substitution into Eq. (Al) together with a = 1 yields 

(x-1) 2 (x-1) 3 (x-1) 4 

ln(x) = 0 + (x- 1) - +2! -3!-—+ • 


2 ! 


3! 


4! 


= (x- 1)- l(x- l) 2 + ^(x- l) 3 - l(x- 1) 4 + -- 


EXAMPLE A2 

Use the first five terms of the Taylor series expansion of e x about x = 0 


X 2 X 3 X 4 

^ = 1+J:+ 2! + 3! + ¥ + --- 


together with the error estimate to find the bounds of e. 


Solution 


1 1 1 „ 65 „ 

6 — 1 + 1 + — T — T — T £4 — — -f- £4 

2 6 24 24 


h 5 


of 


E 4 = r&- = -, 0 < ^ < 1 


5! 5! 

The bounds on the truncation error are 

„o 


,1 


e u 1 e L e 

( £ 4)max = 


Thus the lower bound on e is 


65 1 163 

emm — 


and the upper bound is given by 


which yields 


Therefore, 




65 

£*max 



^max : 

“ 24 

+ T20 


119 


65 


325 

120 

^max = 

24 

^max = 

" 1T9 


163 


325 



— 

< e 

< - 



60 


“ 119 



EXAMPLE A3 

Compute the gradient and the Hessian matrix of 


at the point x = —2, y = 1. 


/(x, y) =ln v / x 2 + y 2 








414 


Appendices 


Solution 

9/ _ 1 ( 1 __Zx_ \ _ x 9/ _ y 

dx Jx 2 + y 2 \2 Jx 2 +y 2 ) x 2 + y 2 dy x 2 + y 2 

V f{x, y) = [x/(x 2 + y 2 ) y/(x 2 + y 2 )] 

V/(-2,1) = [-0.4 0.2 ] T 

3 2 f (x 2 + y 2 ) — x(2x) —x 2 +y 2 
dx 2 (x 2 + y 2 ) 2 (x 2 + y 2 ) 2 

3 2 _l = x 2 -y 2 
dy 2 (x 2 + y 2 ) 2 

9 2 / _ 9 2 / _ ~2xy 
3x3 y dydx (x 2 + y 2 ) 2 


H(x, y) 


-x 2 + y 2 
-2xy 



1 

(x 2 + y 2 ) 2 


H(—2, 1) 


-0.12 0.16 

0.16 0.12 


A2 Matrix Algebra 

A matrix is a rectangular array of numbers. The size of a matrix is determined by the 
number of rows and columns, also called the dimensions of the matrix. Thus a matrix 
of mrows and n columns is said to have the size m x n (the number of rows is always 
listed first). A particularly important matrix is the square matrix, which has the same 
number of rows and columns. 

An array of numbers arranged in a single column is called a column vector, or 
simply a vector. If the numbers are set out in a row, the term row vector is used. Thus 
a column vector is a matrix of dimensions nx 1 and a row vector can be viewed as a 
matrix of dimensions lx n. 

We denote matrices by boldface, upper case letters. For vectors we use boldface, 
lower case letters. Here are examples of the notation: 



^11 

A L2 

^13 


~b 1 - 

A = 

A21 

A22 

A23 

b = 

b 2 


A31 

A32 

A33 


b 3 


(A9) 
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Indices of the elements of a matrix are displayed in the same order as its dimensions: 
the rownumber comes first, followed by the column number. Only one index is needed 
for the elements of a vector. 


Transpose 

The transpose of a matrix A is denoted by A r and defined as 


4 = Aji 


The transpose operation thus interchanges the rows and columns of the matrix. If 
applied to vectors, it turns a column vector into a row vector and vice versa. For 
example, transposing A and b in Eq. (A9), we get 


A r 


An ^21 ^31 

A12 A22 A32 

-Al3 A23 A33 



An n x n matrix is said to be symmetric if A r = A. This means that the elements 
in the upper triangular portion (above the diagonal connecting An and A nn ) of a 
symmetric matrix are mirrored in the lower triangular portion. 


Addition 

The sum C = A + B of two mx n matrices A and B is defined as 

Cij = Aij + Bij, i = 1,2,..., m\ j = 1.2,..., n (A10) 

Thus the elements of C are obtained by adding elements of A to the elements of B. 
Note that addition is defined only for matrices that have the same dimensions. 


Multiplication 

The scalar or dot product c = a b of the vectors a and b, each of size m, is defined as 

m 

c=J^a k b k (All) 

fc=i 

It can also be written in the form c = a r b. 

The matrix product C = AB of an l x rn matrix A and an mx n matrix B is 
defined by 

m 

Cij = 'y ' AncBicj, 
fc=l 


i = 1,2,... ,1; j = 1, 2,..., n 


(A12) 
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The definition requires the number of columns in A (the dimension m) to be equal to 
the number of rows in B. The matrix product can also be defined in terms of the dot 
product. Representing the ith row of A as the vector aand the j Lh column of B as the 
vector bj, we have 

Qj = a, • b , (A13) 


A square matrix of special importance is the identity or unit matrix 

"1 0 0 0“ 

0 1 0 0 

I= 0 0 1 0 

_0 0 0 0 1 _ 

ft has the property AI = IA = A. 


(A14) 


Inverse 

The inverse of an n x n matrix A, denoted by A -1 , is defined to be an n x n matrix 
that has the property 

A -1 A = AA -1 = I (A15) 


Determinant 


The determinant of a square matrix A is a scalar denoted by |A| or det(A). There is no 
concise definition of the determinant for a matrix of arbitrary size. We start with the 
determinant of a 2 x 2 matrix, which is defined as 


An A\2 

A21 A22 


A11A22 — A i 2 A 2 i 


(A16) 


The determinant of a 3 x 3 matrix is then defined as 


An 

A12 

A 13 

A21 

A22 

A23 

A 31 

A32 

A 33 


An 


A22 

A 32 


A 23 

A 33 


A 12 


A 21 

A31 


A 23 

A 33 


+ A 13 


A 21 

A31 


A 22 

A32 


Having established the pattern, we can now define the determinant of an n x n matrix 
in terms of the determinant of an [n — 1) x ( n — 1) matrix: 

n 

l A l = ^(—D fc+lA u-Mi fc 

k= 1 


(A17) 
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where is the determinant of the [n — 1) x (n — 1) matrix obtained by deleting the 
zth row and fcth column of A. The term is called a cofactor of A;t. 

Equation (A17) is known as Laplace’s development of the determinant on the 
first row of A. Actually Laplace’s development can take place on any convenient row. 
Choosing the zth row, we have 

n 

l A l = k+iA ik M ik (A18) 

fc= l 

The matrix A is said to be singular if ]A| = 0. 


Positive Definiteness 


An n x n matrix A is said to be positive definite if 

x t Ax > 0 (A19) 


for all nonvanishing vectors x. ft can be shown that a matrix is positive definite if the 
determinants of all its leading minors are positive. The leading minors of A are the n 
square matrices 


An 

A12 • ' 

Ait 

A12 

A22 • ' 

• • A2fc 

A fc i 

Afc2 •' 

■ • Afct_ 


Therefore, positive definiteness requires that 


An > 0, 


An A12 

A21 A22 


An 

A21 

A12 

A22 

A13 

A23 

> 0,.. 

■. 1 A| > 0 

(A 20 ) 

A31 

A32 

A33 





Useful Theorems 

We list without proof a few theorems that are utilized in the main body of the text. 
Most proofs are easy and could be attempted as exercises in matrix algebra. 


(AB) t = B r A r 

(A21a) 

(ABr 1 = B _1 A _1 

(A21b) 

< 

II 

h 

<_ 

(A21c) 

|AB| = |A| |B| 

(A21d) 


if C = A r BA where B = B r , then C = C T 


(A21e) 
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EXAMPLE A4 


Letting 


1 2 3" 


r 


8 “ 

1 2 i 

u = 

6 

v = 

0 

1 

o 

1—* 

to 

1 _ 


-2 


-3 


compute u + v, u • v, Av and u r Av. 


Solution 


1 + 8' 


9' 

6 + 0 

= 

6 

-2-3 


-5 


u ■ v = 1(8)) + 6(0) + (-2) (-3) = 14 



ai-v 


1(8)+2(0)+3 (-3) 


-1 

Av = 

a2-v 

= 

1(8) + 2(0)+ 1 (-3) 

= 

5 


a 3-V 


_ 0(8) + 1(0) + 2(—3) _ 


-6 


u r Av = u ■ (Av) = 1 (—1) + 6(5) + (—2) (—6) = 41 


EXAMPLE A5 

Compute |A|, where A is given in Example A4. Is A positive definite? 


Solution Laplace’s development of the determinant on the first rowyields 


|A| = 1 




2 

1 


= 1(3)-2(2)+3(1) =2 


Development on the third row is somewhat easier due to the presence of the zero 
element: 


|A| =0 


3 

1 



2 

2 


= 0(—4) - 1 (—2) + 2(0) = 2 


To verify positive definiteness, we evaluate the determinants of the leading 
minors: 


A n = 1 > 0 O.K. 


^11 

A 12 


1 2 

Azi 

A 22 


1 2 


Not O.K. 


A is not positive definite. 
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EXAMPLE A6 

Evaluate the matrix product AB, where A is given in Example A4 and 


B = 


-4 

1 

2 


1 

-4 

-2 


Solution 


AB = 


ai bi 
a 2 bi 
a 3 bi 


ai b 2 
a 2 b 2 
a 3 b 2 


1 (—4) + 2(1) + 3(2) 1(1) + 

1 (—4) + 2(1) + 1(2) 1(1) + 

0 (— 4 ) + 1 ( 1 ) + 2 ( 2 ) 0 ( 1 ) + 


2 (—4) + 3 (—2) 


4 

-13 

2 (—4) + 1(—2) 

= 

0 

-9 

1(—4) + 2 (—2) 


5 

-8 
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Adams-Bashforth-Moulton method, 296 
adaptive Runge-Kutta method, 277-284 
algebra. See linear algebraic equations systems; 
matrix algebra 

ans, 5 

appendices, 411-419 
array manipulation, 21-25 
array functions, 23-25 
creating arrays, 6-8, 21-23 
augmented coefficient matrix, 29 

bisect, 147-148 

bisection method, for equation root, 146-149 
brent, 151-153 
Brent’s method, 150-155 
buildvec function, 15 
Bulirsch-Stoer algorithm, 288 
Bulirsch-Stoer method, 291 
algorithm, 288 
midpoint method, 285-286 
Richardson extrapolation, 286 
bulStoer, 288-289 

calling functions, 17-18 
cardinal functions, 104 
cell arrays, creating, 8-9 
celldisp, 8 
character string, 9 
char, 4 
choleski, 48 

Choleski’s decomposition, 46-52 
class, 4 

coefficient matrices, symmetric/banded, 55-66 
symmetric, 59-60 
symmetric/pentadiagonal, 60-66 
tridiagonal, 56-59 
command window, 25 


composite Simpson’s 1/3 rule, 206 
composite trapezoidal rule, 202-203 
conditionals, flow control, 12-14 
conjGrad, 88-89 
conjugate, 87 

conjugate gradient methods, 87-96, 

390-402 

conjugate directions, 391-392 
Fletcher-Reeves method, 398-402 
Powell’s method, 392-398 
continue statement, 15-16 
count_eVals, 368-369 
cubic splines, 115-121, 192-196 
curve fitting. See interpolation/curve fitting 
cyclic tridiagonal equation, 92 

data types/classes, 4 
char array, 4 
class command, 4 
double array, 4 
function_handle, 4 
logical array, 4 

deflation of polynomials, 174 
direct methods, 31 
displacement formulation, 77 
Doolittle’s decomposition, 43-46 
double array, 4 

editor/debugger window, 25 
eigenvals3, 373 

eigenvalue problems. See symmetric matrix 
eigenvalue problems 
else conditional, 12 
elseif conditional, 12-13 
embedded integration formula, 277 
eps, 5 

equivalent equation, 32 
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Index 


error 
input, 6 

in program execution, 16-17 
programming, 6 
Euler’s method, stability of, 274 
eValBrackets, 371-372 
evalPoly, 173 
evaluating functions, 18-19 
exponential functions, fitting, 131-137 

finite difference approximations, 182-187 
errors in, 187 

first central difference approximations, 
183-184 

first noncentral, 184-185 
second noncentral, 185-187 
first central difference approximations, 
183-184 

first noncentral finite difference 
approximations, 184-185 
fletcherReeves, 399-400 
Fletcher-Reeves method, 398-402 
flow control, 12-17 
conditionals, 12-14 
loops, 12, 14-17 
force formulation, 79 
for loop, 14-15,16 

fourth-order differential equation, 317-321 
fourth-order Runge-Kutta method, 260-261 
function concept, 143 
function definition line, 17 
function_handle, 4 
functions, 17-20 
calling, 17-18 
evaluating, 18-19 
function definition line, 17 
in-line, 19-20 

gauss, 38—39 

Gauss elimination method, 34-42 
algorithm for, 36-39 
back substitution phase, 36 
elimination phase, 35-36 
multiple sets of equations, 39-42 
Gauss elimination with scaled row pivoting, 
68-72 

Gaussian integration, 218 
abscissas/weights for Gaussian quadratures, 
223 

Gauss-Chebyshev quadrature, 224-225 
Gauss-Hermite quadrature, 225-226 
Gauss-Laguerre quadrature, 225 
Gauss-Legendre quadrature, 224 


Gauss quadrature with logarithmic 
singularity, 205,226 

determination of nodal abscissas/weights, 
221-223 

formulas for, 218-219 
orthogonal polynomials, 220-221 
gaussNodes, 227—228 
gaussPiv, 70—71 
gaussQuad, 228 
gaussQuad2, 238—240 
gaussSeidel, 82, 86-87 
Gauss-Seidel method, 84-87 
gerschgorin, 370 
Gerschgorin’s theorem, 369-371 
goldBracket, 386—387 
goldSearch, 387 

Horner’s deflation algorithm, 174 
householder, 363—364 
householderP, 364—365 
householder reduction to tridiagonal form, 
359-367 

accumulated transformation matrix, 
363-367 

householder matrix, 359-360 
householder reduction of symmetric 
matrix, 360-362 

if, 12, 14-17 

ill-conditioning, in linear algebraic equations 
systems, 30-31 
indirect methods, 31 
inf, 5 

initial value problems 
adaptive Runge-Kutta method, 277-284 
Bulirsch-Stoer method, 291 
Bulirsch-Stoer algorithm, 288 
midpoint method, 285-286 
Richardson extrapolation, 286 
introduction, 251-252 
MATLAB functions for, 295-296 
problem set, 273, 291-295 
Runge-Kutta methods, 257-267 
fourth-order, 260-261 
second-order, 258-260 
stability/stiffness, 273-277 
stability of Euhler’s method, 274 
stiffness, 274-275 
Taylor series method, 252-257 
in-line functions, 19-20 
input/output, 20-21 
printing, 20-21 
reading, 20 
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integration order, 237 
interpolation/curve fitting 
interpolation with cubic spline, 115-121 
introduction, 103 
least-squares fit, 125-137 
fitting a straight line, 126-127 
fitting linear forms, 127 
polynomial fit, 128-130 
weighting of data, 130-137 
fitting exponential functions, 

131-137 

weighted linear regression, 

130-131 

MATLAB functions for, 141-142 
polynomial interpolation, 103-115 
Lagrange’s method, 103-105, 108 
limits of, 110-115 
Neville’s method, 108-110 
Newton's method, 105 
problem set, 121-125, 138-141 
interval halving methods, 146 
inverse quadratic interpolation, 150 
invPower, 347 
invPower3, 374—375 
i or j, 5, 7-8, 9 

j acobi, 333—335 
Jacobian matrix, 238 
Jacobi method, 328-344 
Jacobi diagonalization, 330, 336 
Jacobi rotation, 329-330 
similarity transformation/ diagonalization, 
328-329 

transformation to standard form, 336-344 

Laguerre’s method, 174-179 
LAPACK (Linear Algebra PACKage), 28 
least-squares fit, 125-137 
fitting a straight line, 126-127 
fitting linear forms, 127 
polynomial fit, 128-130 
weighting of data, 130-137 
fitting exponential functions, 131-137 
weighted linear regression, 130-131 
linear algebraic equations systems. See also 
matrix algebra 

Gauss elimination method, 34-42 
algorithm for, 36-39 
back substitution phase, 36 
elimination phase, 35-36 
multiple sets of equations, 39-42 
ill-conditioning, 30-31 
introduction, 28 


iterative methods, 84-96 
conjugate gradient method, 

87-96 

Gauss-Seidel method, 84-87 
linear systems, 30-31 
LU decomposition methods, 42-55 
Choleski’s decomposition, 46-52 
Doolittle’s decomposition, 43-46 
MATLAB functions for, 100-102 
matrix inversion, 81-83 
methods of solution, 31-32 
notation in, 28-29 
overview of direct methods, 32-34 
pivoting, 66-81 
diagonal dominance and, 68 
Gauss elimination with scaled row 
pivoting, 68-72 
when to pivot, 72-75 
problem set, 55, 75-81, 100 
symmetric/banded coefficient matrices, 
55-66 

symmetric, 59-60 
symmetric/pentadiagonal, 60-66 
tridiagonal, 56-59 
uniqueness of solution for, 29, 30 
linear forms, fitting, 127 
linear systems, 30-31 
linlnterp, 299 
logical, 11 
logical array, 4 
loops, 12, 14-17 
LUdec, 44-45 
LUdec3, 58—59 
LUdec5, 63 
LUdecPiv, 71-72 
LUsol, 45-46 
LUsol3, 59 
LUsol5, 63—64 
LUsolPiv, 72 

matlnv, 72 
MATLAB 

array manipulation, 21-25 
cells, 8-9 
data types, 4 
flow control, 12-17 
functions, 17-20 
input/output, 20-21 
operators, 9-11 
overview, 1-3 
strings, 4 
variables, 5-6 

writing/running programs, 25-26 
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MATLAB functions 
initial value problems, 295-296 
interpolation/curve fitting, 141-142 
linear algebraic equations systems, 100-102 
multistep method, 296 
numerical differentiation, 198-199 
numerical integration, 250 
optimization, 409 
roots of equations, 180-181 
single-step method, 296 
symmetric matrix eigenvalue problems, 381 
two-point boundary value problems, 324 
matrix algebra, 414-419 
addition, 415 
determinant, 416-417 
example, 418-419 
inverse, 416 
multiplication, 415-416 
positive definiteness, 417 
transpose, 415 
useful theorems, 417 
matrix inversion, 81-83 
midpoint,286—288 
modified Euler’s method, 259 
multiple integrals, 235-248 
Gauss-Legendre quadrature over 

quadrilateral element, 236-243 
quadrature over triangular element, 243-247 

NaN, 5 

neville, 109-110 
newtonCoeff, 107—108 
Newton-Cotes formulas, 201-209 
composite trapezoidal rule, 202-203 
recursive trapezoidal rule, 204-205 
Simpson’s rules, 205-209 
trapezoidal rule, 202 
newtonPoly, 106 
newtonRaphson, 156-158 
newtonRaphson2, 161—163 
Newton-Raphson method, 155-160 
norm of matrix, 
notation, 28-29 
numerical differentiation 
derivatives by interpolation, 191-196 
cubic spline interpolant, 192-196 
polynomial interpolant, 191-192 
finite difference approximations, 182-187 
errors in, 187 

first central difference approximations, 
183-184 

first noncentral, 184-185 
second noncentral, 185-187 


introduction, 182 
MATLAB functions for, 198-199 
problem set, 196-198 
Richardson extrapolation, 188-191 
numerical integration 
Gaussian integration, 218 
abscissas/weights for Guaussian 
quadratures, 223 

Gauss-Chebyshev quadrature, 224-225 
Gauss-Hermite quadrature, 225-226 
Gauss-Laguerre quadrature, 225 
Gauss-Legendre quadrature, 224 
Gauss quadrature with logarithmic 
singularity, 205,226 

determination of nodal abscissas/weights, 
221-223 

formulas for, 218-219 
orthogonal polynomials, 220-221 
introduction, 200-201 
MATLAB functions for, 250 
multiple integrals, 235-248 
Gauss-Legendre quadrature over 
quadrilateral element, 236-243 
quadrature over triangular element, 
243-247 

Newton-Cotes formulas, 201-209 
composite trapezoidal rule, 202-203 
recursive trapezoidal rule, 204-205 
Simpson’s rules, 205-209 
trapezoidal rule, 202 
problem set, 214-218, 233-235, 247-248 
Romberg integration, 210-214 

operators, 9-11 
arithmetic, 
comparison, 11 
logical, 11 
optimization 

conjugate gradient methods, 390-402 
conjugate directions, 391-392 
Fletcher-Reeves method, 398-402 
Powell's method, 392-398 
introduction, 382-383 
MATLAB functions for, 409 
minimization along a line, 384-390 
bracketing, 384 

golden section search, 384-390 
problem set, 403-409 
overrelaxation, 85 

P-code (pseudo-code), 25 
pi, 5 

pivot equation, 35-36 




425 


Index 


pivoting, 66-81 
diagonal dominance and, 68 
Gauss elimination with scaled row 
pivoting, 68-72 
when to pivot, 72-75 
plotting, 26-27 
polynFit, 128—129 
polynomial interpolant, 191-192 
polynomials, zeroes of, 171-179 
polyRoots, 176—177 
Powell, 394-395 
Powell’s method, 392-398 
Prandtl stress function, 245 
printing input/output, 20-21 
printSol, 254 

quadrature. See numerical integration 

reading input/output, 20 
realmax, 5 
realmin, 5 

recursive trapezoidal rule, 204-205 
relaxation factor, 85 
return command, 15, 16 
Richardson extrapolation, 188-191, 

286 

romberg, 212—213 
Romberg integration, 210-214 
roots of equations 
Brent’s method, 150-155 
incremental search method, 144-146 
introduction, 143-144 
MATLAB functions for, 180-181 
method of bisection, 146-149 
Newton-Raphson method, 155-160 
problem set, 165-171, 180 
systems of equations, 160-165 
Newton-Raphson method, 160-165 
zeroes of polynomials, 171-179 
deflation of polynomials, 174 
evaluation of polynomials, 172-173 
Laguerre's method, 174-179 
roundoff error, 187 
Runge-Kutta-Fehlberg formula, 277 
Runge-Kutta methods, 257-267 
fourth-order, 260-261 
second-order, 258-260 
runKut4, 260-261 
runKut5, 280—281, 285 

scale factor, 68 
script files, 25 
secent formula, 167 


second noncentral finite difference 
approximations, 185-187 
second-order differential equation, 313-317 
second-order Runge-Kutta method, 258-260 
shooting method, for two-point boundary value 
problems, 298-308 
higher-order equations, 303-308 
second-order differential equation, 298-303 
similarity transformation, 329 
Simpson’s 1/3 rule, 206 
Simpson’s rules, 205-209 
sortEigen, 335—336 
sparse matrix, 101 
splineCurv, 117—118 
splineEval, 118—119 
stability/stiffness, 273-277 
stability of Euhler’s method, 274 
stiffness, 274-275 
stdDev, 129—130 
stdForm, 337, 338 
steepest descent method, 87 
stiffness, 274-275 
straight line, fitting, 126-127 
strcat, 7-8, 9 
strings, creating, 9 
Strum sequence, 367-369 
sturmSeq, 368 
swapRows, 70 

switch conditional, 13-14, 16 
symmetric coefficient matrix, 59-60 
symmetric matrix eigenvalue problems 
eigenvalues of symmetric tridiagonal 
matrices, 367-376 
bracketing eigenvalues, 371-372 
computation of eigenvalues, 373-374 
computation of eigenvectors, 374-376 
Gerschgorin’s theorem, 369-371 
Strum sequence, 367-369 
householder reduction to tridiagonal form, 
359-367 

accumulated transformation matrix, 
363-367 

householder matrix, 359-360 
householder reduction of symmetric 
matrix, 360-362 
introduction, 326-328 
inverse power/power methods, 344-352 
eigenvalue shifting, 346-347 
inverse power method, 344-346 
power method, 347-352 
Jacobi method, 328-344 
Jacobi diagonalization, 330, 336 
Jacobi rotation, 329-330 
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symmetric matrix eigenvalue problems ( cont.) 
similarity transformation/diagonalization, 
328-329 

transformation to standard form, 

336-344 

MATLAB functions for, 381 
problem set, 352,376-381 
symmetric/pentadiagonal coefficient matrix, 
60-66 

synthetic division, 174, 151-153 
taylor, 254 

Taylor series, 252-257, 411-414 
function of several variables, 

412-414 

function of single variable, 411-412 
transpose operator, 7 
trapezoid, 204-205 
trapezoidal rule, 202 
triangleQuad, 244-245 
triangular, 32-33 

tridiagonal coefficient matrix, 56-59 
two-point boundary value problems 
finite difference method, 312-321 


fourth-order differential equation, 317-321 
second-order differential equation, 313-317 
introduction, 297-298 
MATLAB functions for, 324 
problem set, 308-312, 321-324 
shooting method, 298-308 
higher-order equations, 303-308 
second-order differential equation, 298-303 

underrelaxation factor, 85 

variables, 5-6 

built-in constants/special variable, 5 
example, 5-6 
global, 5 

weighted linear regression, 130-131 
while loop, 14 

writing/running programs, 25-26 

zeroes of polynomials, 171-179 
deflation of polynomials, 174 
evaluation of polynomials, 172-173 
Laguerre’s method, 174-179 




